Course materials for Michael Dunn's course at the 2018 LOT Winter School
Exercise 1: Bayesian Phylogenetic Inference with MrBayes
Datafile: Aslian28.nex
Exercise 2: Constraints and multiple partitions
Datafiles: new-britain-swadesh.nex, new-britain-structural.nex
Exercise 3: Ancestral State Reconstruction and the DISCRETE test
Datafiles: Aslian28+structure1.nex, Aslian28.nex.trees, Aslian-complexity.txt, Aslian-semantics.txt
Using MrBayes, do a mixed model analysis with a partition with one standard morphological character in it. Infer the ancestral states for this character. The file Aslian28+structure.nex
includes the same lexical data as previously, as well as a single column coding pronoun paradigm complexity (0=low, 1=medium, 2=high).
The file starts::
begin data;
dimensions ntax=28 nchar=566;
format datatype=mixed(standard:1,restriction:2-566);
matrix
Tenen_Palian 110010001000100000000001000101 ...
Set up a MrBayes analysis. Refer to the previous exercise (Multiple Models) and think about how to set up the analysis: Run showmodel
to make sure the right things are being estimated independently.
mcmc
to infer ancestral states for partition 1:
report applyto=(1) ancstates=yes
If you think that complexity varies more-or-less continously, you could also set partition 1 to the ordered character type (disallowing direct changes from state 0 to state 2):
ctype ordered: 1
The output (viewed using sump
, “summarise parameters”) will include p(0)
, p(1)
and p(2)
, the probabilities of each ancestral state at the root. Try estimating the values at other nodes as well. You’ll need to constrain the tree to contain the branches of interest:
# define South Aslian clade
constraint southaslian = Semaq_Beri_Brw Semaq_Beri_Jaboy
Semelai Mah_Meri;
# define Central Aslian clade
constraint centralaslian = Semnam_Bal Semnam_Malau Lanoh_Kertei
Temiar_Kelantan Temiar_Perak Semai_Ringlet Semai_Kampar;
# tell MrBayes to use these constraints as topology priors
prset topologypr = constraints(southaslian,centralaslian);
It is also possible to do ancestral state reconstruction using BayesTraits. You start BayesTraits from the command line, using the format
BayesTraits TREEFILE DATAFILE
The tree file is a nexus file containing a tree sample; the data file is a simple text file with taxon labels and traits. Here are some usable examples: Aslian28.nex.trees, and a data file aslian-complexity.txt, which contains multistate coding of pronoun complexity along a scale 0-1-2 (where 0 is least complex).
Load the complexity data:
BayesTraits Aslian28.nex.trees aslian-complexity.txt
BayesTraits will reconstruct the values of any nodes defined in the settings. By default there is only the root. Other nodes are defined by the addMRCA
command. The format of the addMRCA
command to create a node called NodeName consisting of the MRCA of Taxa named Taxon1, Taxon2 and Taxon3:
addMRCA NodeName Taxon1 Taxon2 Taxon3
Note that MrBayes has a slightly different method for estimating ancestral state values: it coestimates the phylogeny while inferring the ancestral state of the feature of interest. The feature being reconstructed contributes phylogenetic information to the tree inference. In the BayesTraits method the tree sample is is a prior to the analysis, i.e. it is prespecified, and completely independent of the ancestral state reconstruction.
You can control BayesTraits using a file, rather than interactively (this definitely works with Mac and Linux, but perhaps not in Windows). The file simply contains all the things that you would otherwise type, in order and delimited by new lines. You pipe it to BayesTraits as follows:
BayesTraits TREEFILE DATAFILE < CONTROLFILE
As well as using addMRCA
to define a node to be reconstructed, you can also ‘fossilize’ a node to a particular value. The format is similar. The following example defines a node called NodeName with value fixed (‘fossilized’) to A, consisting of the MRCA of Taxa named Taxon1, Taxon2 and Taxon3::
fossil NodeName A Taxon1 Taxon2 Taxon3
The Reverse jump hyperprior is a method which treats the model as a parameter of the analysis to be estimated.
Hints:
R> d = read.delim("aslian-1.log.csv")
R> head(d) # to check what the headers are
R> sort(table(d$Model.string))
This shows you how many of each model string appear in the column. What do the most frequent ones look like. Are model strings with q02 and q20 as Z
more common?
restrict q02 q20 0
The DISCRETE method is used to test whether pairs of binary features are evolving dependently or independently. It has been used in anthropology and linguistics to test for evolutionary correlations between matrilineal inheritance and cattle-keeping, between post-marital residence mode and brideprice/dowry, and for a range of correlations between aspects of word order typology. If you like biology, there’s a good sample data set that comes with the BayesTraits download too: a test for correlation between estrus advertisement and multi male mating in primate species.
For this exercise we will be working with a sample of Aslian language trees Aslian28.nex.trees. The state file contains two columns of data:
Column 1: Semantic conflation of “man” and “husband” (1: the word for “man” is the same as the word for “husband”; 2: different)
Column 2: Semantic conflation of “woman” and “wife” (1: same; 2: different)
The format of this file is:
Ceq_Wong 0 0
Jah_Hut 0 0
Jahai_Banun 0 0
Jahai_Rual 1 0
Kammu 0 0
Kensiw_Kedah 1 1
Load the data into MrBayes with the command:
BayesTraits Aslian28.nex.trees Aslian-semantics.txt
You should see that there are quite a few different options offered:
Please Select the model of evolution to use.
1) MultiState
2) Discrete: Independent
3) Discrete: Dependant
4) Continuous: Random Walk (Model A)
5) Continuous: Directional (Model B)
6) Continuous: Regression
7) Independent contrast
We’re interested in the two “Discrete” options.
Choose 2 “Discrete: Independent” then 1 “Maximum Likelihood”. Run the analysis and look at the inferred rates. Remember, these approximate the mean values.
hint: for reproducibility, do these analyses using command files
Repeat this as a MCMC analysis with a reverse jump hyperprior. Add the setting rjhp exp 0 XX
, where XX
is a bit over the mean rate values found in the ML analysis.
Do a “Discrete: Dependent” MCMC analysis using the same settings.
Calculate the Bayes Factor for the Dependent versus Independent analysis. How confident can we be that these semantic conflations are correlated?
Here are some sample command files to run this analysis:
# test-I.cmd
2
2
logfile test-I.log
iterations 1000000
burnin 500000
rjhp exp 0 20
run
Another one:
# test-D.cmd
3
2
logfile test-D.log
iterations 1000000
burnin 500000
rjhp exp 0 20
run