In this exercise we practise two things which are prerequisites for carrying out Ancestral State Reconstrution (see exercise 3). Constraining tree inferenc to include particular nodes is useful for incorporating prior phylogenetic knowledge into your analyeses, and is crucial if you want to reconstruct values at these specific nodes. Analysis with multiple partitions lets you use multiple kinds of data at once, and is also used in ancestral state reconstruction.
Exclude a language from the analysis:
Note that MrBayes automatically redoes the ascertainment bias check when you change the included taxa.
Set the outgroup:
Constrain the search so that it only visits regions of the tree space containing the South Aslian clade:
constraint southaslian = Semaq_Beri_Brw Semaq_Beri_Jaboy Semelai Mah_Meri prset topologypr = constraints(southaslian)
In the Ancestral States Reconstruction exercise (exercise 3), MrBayes will estimate the values for all clades defined by the
Output of the taxastat command:
MrBayes > taxastat Showing taxon status: Number of taxa = 28 (of which 27 are included) Number of constraints = 1 1 -- Trees with 'hard' constraint "southaslian" are infinitely more probable than those without Constraints Taxon Inclusion 1 ------------------------------------------------ 1 (Tenen_Palian) -- Included . 2 (Tenen_Paborn) -- Included . 3 (Kensiw_Perak) -- Included . 4 (Kensiw_Kedah) -- Included . 5 (Kintaq) -- Included . 6 (Jahai_Banun) -- Included . 7 (Jahai_Rual) -- Included . 8 (Menriq_Lah) -- Included . 9 (Menriq_Rual) -- Included . 10 (Batek_Teh_Taku) -- Included . 11 (Batek_Teh_Lebir) -- Included . 12 (Batek_T) -- Included . 13 (Batek_Deq_Terengganu) -- Included . 14 (Batek_Deq_Koh) -- Included . 15 (Ceq_Wong) -- Included . 16 (Semnam_Bal) -- Included . 17 (Semnam_Malau) -- Included . 18 (Lanoh_Kertei) -- Included . 19 (Temiar_Kelantan) -- Included . 20 (Temiar_Perak) -- Deleted . 21 (Semai_Ringlet) -- Included . 22 (Semai_Kampar) -- Included . 23 (Jah_Hut) -- Included . 24 (Mah_Meri) -- Included * 25 (Semaq_Beri_Brw) -- Included * 26 (Semaq_Beri_Jaboy) -- Included * 27 (Semelai) -- Included * -> 28 (Kammu) -- Included . '.' indicate that the taxon is not present in the constraint. '*' indicate that the taxon is present in the 'hard' constraint. '+' indicate that the taxon is present in the first group of a 'partial' constraint. '-' indicate that the taxon is present in the second group of a 'partial' constraint. '#' indicate that the taxon is present in the 'negative' constraint. Arrow indicates current outgroup.
The Bayes Factor test measures the ratio L(H₁D) (the likelihood of Hypothesis 1 given the data compared to the likelihood of Hypothesis 2 given the data. This presumes that ‘the data’ D is the same for each likelihood calculation. You can’t compare one hypothesis with one set of data to another hypothesis with another set of data.
To do this manually you need to calculate the harmonic mean of the log likelihood scores of your parameter settings. This is surprisingly easy:
# Write a function to calculate the harmonic mean of a vector R> harmonic.mean <- function(v) length(v) / sum(1/v) # Read in the parameter values file from the MrBayes output R> model1.p <- read.delim("model1.nex.run1.p", skip=1) # The log-likelihood scores are in the column LnL: R> harmonic.mean(model1.p$LnL)  -4634
To compare how much we can prefer model 1 to model 2, take twice the difference between the harmonic means:
R> bayes.factor <- 2 * (harmonic.mean(model1.p$LnL) - harmonic.mean(model2.p$LnL))
The interpretation of Bayes Factors are:
|0-2||Barely worth mentioning|
|2-6||Positive evidence in favour of model 1|
Source: Kass and Rafferty (1995)
Cognate presence/absence is coded using the binary restriction data Type. If you have e.g. structural data with more than two states you should use the standard data type. The exercise on Ancestral State Reconsruction contains an example.
You can include two different kinds of data evolving under two different models in a single analysis. It is defined as follows in the NEXUS file:
#NEXUS begin data; [ ... other stuff ... ] format mixed(Restriction:1-100,Standard:101-200) interleave=yes gap=- missing=?;
The numbers 1-100 and 101-201 specify the columns to be included in each partition. Then in the MrBayes block you name the character sets, define a partition (a combination of character sets), and set this partition as the active one:
begin mrbayes; charset swadesh100 = 1-100; charset structuralfeatures = 101-200; partition complete = 2: swadesh100, structuralfeatures; set partition = complete; [ use this partitioning ]
If you’re doing this by the console you can enter
showmodel to see what the current setup is for each partition. The parameters can be set for partitions individually, or in groups. The following command sets both partitions to use gamma variation for rates estimate:
lset applyto=(all) rates=gamma;
Other things to think about include the
coding setting (for ascertainment bias, which might be different in the different kinds of data you’re using). If you enter
showmodel again you can see that many parameters are shared across partitions. You probably want these to be estimated separately, which you do with
unlink statefreq=(all) revmat=(all) shape=(all) pinvar=(all);
Check the difference with
Finally, this analysis assumes that all partitions evolve at the same rate. To set MrBayes to estimate rates independently for each partition, use:
prset applyto=(all) ratepr=variable
restrictiontype). Do you want the same parameters for each one? If you are scrupulous about this (i.e. have lots of time) you will test both
invgamma, as well as various branch length priors, etc.
interleave=yesformat option you can simply paste one matrix below the other (but both within a single
matrix .... ;command. Define a partition for each, and make sure the parameter estimates are independent for each partition (using