From Lab to the World

Course materials for Michael Dunn's course at the 2018 LOT Winter School

Lecture outline and readings

Software requirements

Exercise 1: Bayesian Phylogenetic Inference with MrBayes
Datafile: Aslian28.nex

Exercise 2: Constraints and multiple partitions
Datafiles: new-britain-swadesh.nex, new-britain-structural.nex

Exercise 3: Ancestral State Reconstruction and the DISCRETE test
Datafiles: Aslian28+structure1.nex, Aslian28.nex.trees, Aslian-complexity.txt, Aslian-semantics.txt

3. Ancestral State Reconstruction and Correlated Evolution

3.1 Co-estimating stucture and phylogeny

Using MrBayes, do a mixed model analysis with a partition with one standard morphological character in it. Infer the ancestral states for this character. The file Aslian28+structure.nex includes the same lexical data as previously, as well as a single column coding pronoun paradigm complexity (0=low, 1=medium, 2=high).

The file starts::

  begin data;
    dimensions ntax=28 nchar=566;
    format datatype=mixed(standard:1,restriction:2-566);
    matrix
      Tenen_Palian         110010001000100000000001000101 ...

Set up a MrBayes analysis. Refer to the previous exercise (Multiple Models) and think about how to set up the analysis: Run showmodel to make sure the right things are being estimated independently.

If you think that complexity varies more-or-less continously, you could also set partition 1 to the ordered character type (disallowing direct changes from state 0 to state 2):

ctype ordered: 1

The output (viewed using sump, “summarise parameters”) will include p(0), p(1) and p(2), the probabilities of each ancestral state at the root. Try estimating the values at other nodes as well. You’ll need to constrain the tree to contain the branches of interest:

  # define South Aslian clade
  constraint southaslian = Semaq_Beri_Brw Semaq_Beri_Jaboy
      Semelai Mah_Meri;
  # define Central Aslian clade
  constraint centralaslian = Semnam_Bal Semnam_Malau Lanoh_Kertei
      Temiar_Kelantan Temiar_Perak Semai_Ringlet Semai_Kampar;
  # tell MrBayes to use these constraints as topology priors
  prset topologypr = constraints(southaslian,centralaslian);

3.2 ASR using Multistate

It is also possible to do ancestral state reconstruction using BayesTraits. You start BayesTraits from the command line, using the format

BayesTraits TREEFILE DATAFILE

The tree file is a nexus file containing a tree sample; the data file is a simple text file with taxon labels and traits. Here are some usable examples: Aslian28.nex.trees, and a data file aslian-complexity.txt, which contains multistate coding of pronoun complexity along a scale 0-1-2 (where 0 is least complex).

Load the complexity data:

BayesTraits Aslian28.nex.trees aslian-complexity.txt

BayesTraits will reconstruct the values of any nodes defined in the settings. By default there is only the root. Other nodes are defined by the addMRCA command. The format of the addMRCA command to create a node called NodeName consisting of the MRCA of Taxa named Taxon1, Taxon2 and Taxon3:

addMRCA NodeName Taxon1 Taxon2 Taxon3

Note that MrBayes has a slightly different method for estimating ancestral state values: it coestimates the phylogeny while inferring the ancestral state of the feature of interest. The feature being reconstructed contributes phylogenetic information to the tree inference. In the BayesTraits method the tree sample is is a prior to the analysis, i.e. it is prespecified, and completely independent of the ancestral state reconstruction.

You can control BayesTraits using a file, rather than interactively (this definitely works with Mac and Linux, but perhaps not in Windows). The file simply contains all the things that you would otherwise type, in order and delimited by new lines. You pipe it to BayesTraits as follows:

BayesTraits TREEFILE DATAFILE < CONTROLFILE

Node fossilization

As well as using addMRCA to define a node to be reconstructed, you can also ‘fossilize’ a node to a particular value. The format is similar. The following example defines a node called NodeName with value fixed (‘fossilized’) to A, consisting of the MRCA of Taxa named Taxon1, Taxon2 and Taxon3::

fossil NodeName A Taxon1 Taxon2 Taxon3

Reverse jump hyperprior

The Reverse jump hyperprior is a method which treats the model as a parameter of the analysis to be estimated.

Hints:

3.3 The DISCRETE method

The DISCRETE method is used to test whether pairs of binary features are evolving dependently or independently. It has been used in anthropology and linguistics to test for evolutionary correlations between matrilineal inheritance and cattle-keeping, between post-marital residence mode and brideprice/dowry, and for a range of correlations between aspects of word order typology. If you like biology, there’s a good sample data set that comes with the BayesTraits download too: a test for correlation between estrus advertisement and multi male mating in primate species.

For this exercise we will be working with a sample of Aslian language trees Aslian28.nex.trees. The state file contains two columns of data:

Column 1: Semantic conflation of “man” and “husband” (1: the word for “man” is the same as the word for “husband”; 2: different)

Column 2: Semantic conflation of “woman” and “wife” (1: same; 2: different)

The format of this file is:

  Ceq_Wong      0    0
  Jah_Hut       0    0
  Jahai_Banun   0    0
  Jahai_Rual    1    0
  Kammu         0    0
  Kensiw_Kedah  1    1

The analysis

  1. Load the data into MrBayes with the command:

    BayesTraits Aslian28.nex.trees Aslian-semantics.txt

    You should see that there are quite a few different options offered:

       Please Select the model of evolution to use.
       1) MultiState
       2) Discrete: Independent
       3) Discrete: Dependant
       4) Continuous: Random Walk (Model A)
       5) Continuous: Directional (Model B)
       6) Continuous: Regression
       7) Independent contrast
    

    We’re interested in the two “Discrete” options.

  2. Choose 2 “Discrete: Independent” then 1 “Maximum Likelihood”. Run the analysis and look at the inferred rates. Remember, these approximate the mean values.

    hint: for reproducibility, do these analyses using command files

  3. Repeat this as a MCMC analysis with a reverse jump hyperprior. Add the setting rjhp exp 0 XX, where XX is a bit over the mean rate values found in the ML analysis.

  4. Do a “Discrete: Dependent” MCMC analysis using the same settings.

  5. Calculate the Bayes Factor for the Dependent versus Independent analysis. How confident can we be that these semantic conflations are correlated?

Here are some sample command files to run this analysis:

  # test-I.cmd
  2
  2
  logfile test-I.log
  iterations 1000000
  burnin 500000
  rjhp exp 0 20
  run

Another one:

  # test-D.cmd
  3
  2
  logfile test-D.log
  iterations 1000000
  burnin 500000
  rjhp exp 0 20
  run