MERS-Coronavirus Molecular Epidemiology and Genetic Analysis - Origin and Evolution

MERS-Coronavirus Molecular Epidemiology and Genetic Analysis

This is an update of an older page describing an analysis based on 13 sequences, here.

There are now >70 published complete or partial genome sequences from MERS-CoV cases both of humans and, recently, camels. 

A full list of acknowledements and links for the sequences used is available here.

How do the recent camel MERS-CoV sequences fit in to the tree of human cases?

There are 73 MERS-CoV sequences now available. The majority are human cases from Saudi Arabia published in Cotten et al 2013 or Cotten et al 2014 (see page above for the full list). In addition there are 6 partial ORF1ab/spike sequences from camels from KSA published in Alagaili et al 2014 and a partial camel genome associated with 2 Qatari human cases from Haagmans et al 2013. Finally an almost complete genome from a camel in Egypt published by Chu et al 2014, a country with no recorded cases of human MERS-CoV.

Figure 1|Maximum likelihood phylogeny of complete and partial genomes. Shaded areas represent cases that have a known or inferred epidemiological link. Camel sequences have red labels. Tree was rooted using the Egyptian Camel sequence as the most divergent.

When did these strains share a common ancestor?

In previous analyses I used a simple root-to-tip regression method to estimate the rate of evolution and the date of the most recent common ancestor. This estimated the rate of molecular evolution to be 1.48×10-3 subst/site/year and the most recent common ancestor of the viruses sampled to be mid-2011. As I mentioned then this was a relatively crude estimate based on a limited set of sequences. Recent papers (Cotten et al 2013 and Cotten et al 2014) have reported over 60 genome sequences from the Kingdom of Saudia Arabia (KSA) performing a more sophisticated and statistically informative Bayesian phylogenetic approach using BEAST. The latter paper estimates the rate of evolution at the slightly slower rate 1.12×10–3 substitutions/site per year (95% interval of 8.76 × 10–4 to 1·37 × 10–3).

Since that paper was published 1 additional genetic sequences have become available. This is interesting because it is from a camel in Egypt and is unlinked to any known human cases (unlike the recent Qatari camel sequence). Here I have added this additional sequence into an analysis of 27 human cases sampled down from the 71 available to have only one representative of each known or suspected epidemiological cluster.

Technical details of analysis. This is an analysis of 28 complete or partial genomes sampled from April 2012 to late 2013. Only the protein coding regions are used aligned on codon boundaries for a total of 29,361 nucleotides or 9787 codons. BEAST v1.8 was used to construct a tree and estimate evolutionary parameters including the average rate of evolution. A uncorrelated log normal relaxed clock was used to model rate variation amongst lineages, the SRD06 model was used to model variation amongst sites and codon positions and an exponential growth coalescent model was used as a prior on the tree. An MCMC run of 100M steps was run, thinned to 10,000 samples and a 10% burnin discarded. The remaining 9000 samples were used to construct a maximum clade credibility tree and to summarize the model parameters as posterior probability densities. A second run using a strict molecular clock was run in the same fasion for comparison.

The resulting tree and timescale of the epidemic is shown in Fig 2. Superimposed in this figure is the estimate of the time of the most recent common ancestor (TMRCA) of all the MERS-CoV isolates, drawn as a probability density. The two earliest viruses, EMC_2012 from Bisha in KSA and Jordan-N3 from Al-Zarqa in Jordan group as seperate lineage from all the more recently sampled viruses and the Egyptian camel sequence groups out at the root of the tree despite its much more recent date. The common ancestor of the whole most likely existed towards the end of 2010 (but could have been as much as a year either way with 95% probability). The recent lineage (referred to as 'clade B' in Cotten et al 2013) dates from the beginning of 2012.

Figure 2|Time calibrated phylogenetic tree of 28 publically available human MERS-CoV genome sequences and one camel MERS-CoV from Egypt. A BEAST phylogeny with posterior probability density estimates of the time of most recent common ancestor (TMRCA) of all sequences (green) and the more recent, predominently KSA, lineage (blue). Also shown in red is a time series of recorded human cases. Al-Hasa_2_2013 is used as a representitive of the nocosomal outbreak in Al-Hasa and Bisha_1_2012 is removed because of its close similarity to Riyadh_1_2012 despite its differing time and location. The numbers by certain nodes are posterior probabilities but only those <0.95 are shown.
The estimated rate of evolution is very similar to Cotten et al. (2014) with which this study shares most of its data. This analysis differs in that it used a relaxed molecular clock allowing for rates of evolution to vary from branch to branch. This more general model has greater uncertainty in the evolutionary rate but is otherwise compatible with the strict molecular clock model which assumes a single rate of evolution across the whole tree (see Fig 2). 

Figure 3|Estimated rate of molecular evolution. The posterior probability density of the estimated rate of evolution for 13 genome sequences under the relaxed (blue) and strict (red) molecular clock.

The parameter estimates and their 95% credible intervals are given in this table:

Rate: 1.19 x 10-3 subst/site/year 95% Interval: 0.84 x 10-3, 1.57 x 10-3
tMRCA: Oct 2010 95% Interval: Sep 2009 - Se 2011
tMRCA (excluding EMC & Jordan-N3) Mar 2012 95% Interval: Nov 2011 - Jun 2012

Comparison with previous studies

There are now 7 studies (including the ones on this site) that have estimated the rate of evolution of MERS-CoV. These differ in the number of sequences being used and in the total time span between the earliest and latest time span. Both these factors can affect the accuracy and precision of our rate estimates. The table below summarizes the results from the different studies (including both the strict clock and relaxed clock for the present study). 

Study Model type #Seqs Earliest Sequence Latest Sequence Timespan
Rate mean
[95% CI]
Ingroup Date
[95% CI]
Drosten et al regression 5 JordanN3_2012 
341 1.52 Jun-11 N/A
This site regression 6 JordanN3_2012
389 1.48 Jun-11 N/A
Cotten et al 2013 relaxed 11 Bisha_1_2012
350 0.63
(CI: 0.14, 1.1)
n/a Jul 2011
[Jul '07, Jun '12]
Cauchemez et al clock 12 JordanN3_2012
415 1.0
(CI: 0.68, 1.3)
Oct 2010
[Nov '08, Jun '11]
This site clock 13 JordanN3_2012
546 0.99
(CI: 0.73, 1.26)
Nov 2010
[May '10, Jun '11]
Feb 2012
[Nov '11, May '12]
This site relaxed 13 JordanN3_2012
546 1.02
(CI: 0.64, 1.43)
Nov 2010
[Oct '10, Oct '11]
Feb 2012
[Aug '11, Jun '12]
Cotten et al. (2014) relaxed 42 Bisha_1_2012

481 1.12
(CI: 0.88, 1.37)
n/a Mar 2012
[Dec '11, Jun '12]
This study clock 29 JordanN3_2012
565 0.97
(CI: 0.78, 1.16)
Jun 2010
[Oct '10, Feb '11]
Feb 2012

[Nov '11, May '12]
This study relaxed 29 JordanN3_2012
565 1.19
(CI: 0.84, 1.57)
Oct 2010
[Sep '10, Sep '11]
Mar 2012

[Nov '11, May '12]

The effect of number of sequences and time span on the rate estimates are given in Fig 4 & Fig 5, respectively. 

Figure 4|Estimated rate of evolution by number of sequences in study. This is a plot of the estimated rate of evolution against the number of sequences used. The points left to right correspond to the order of studies in the table, above. The open circle is the study presented here but with the relaxed clock model, thus giving wider credible intervals.

Figure 5|Estimated rate of evolution by time span of sampling. This is a plot of the estimated rate of evolution against the time span in days between the earliest and latest sequence used in the study. The open circle is the study presented here but with the relaxed clock model, thus giving wider credible intervals.

These results suggest that as more data is added, the rate estimate is becoming more reliable and that adding more data is unlikely to change the estimate of rate substantially (although it may improve the precision of the estimate). Likewise the estimate of the TMRCA is unlikely to change substantially unless, new unsampled diversity is discovered.

Does this date represent emergence in humans?

The simple answer is not necessarily - this is simply an estimate of the date the common ancestor of all the currently sequenced human strains last existed (i.e., were present in a single host). Given we only have viruses sampled from humans (and a few fragments from one camel) it is impossible to say which species this virus was in. Recent evidence has implicated the camel in the MERS-CoV story so it is reasonable to consider the possibility that this animal has been the primary host in recent years. The evidence is far from conclusive and there are still mysteries to solve.

I have written some notes about the camel-source hypothesis, the evidence and the questions it poses, here.

Some older notes about animal reservoirs, mostly considering the role of bats (which are still the likely ultimate source of MERS-CoV) are here.


Cotten et al. (2014) 'Spread, Circulation, and Evolution of the Middle East Respiratory Syndrome Coronavirus', mBio 5.

Cauchemez et al (2013), The Lancet Infectious Diseases14: 50 - 56.

Cotten et al. (2013) 'Transmission and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: a descriptive genomic study', The Lancet382: 1993 - 2002.

Guery et al (2013), The Lancet  381: 2265 - 2272.

Haagmans et al (2013) "Middle East respiratory syndrome coronavirus in dromedary camels: an outbreak investigation" The Lancet Infectious Diseases.

Meyer B, Mueller MA, Corman VM, et al (2014) Antibodies against MERS coronavirus in dromedary camels, United Arab Emirates, 2003 and 2013. Emerg Infect Dis. Online access.

Reusken et al (2013) "Middle East respiratory syndrome coronavirus neutralising serum antibodies in dromedary camels: a comparative serological study" The Lancet Infectious Diseases, 13: 859 - 866.