The MrBayes software package is operated from the application console (a special terminal just for MrBayes provided by the package itself); you can also use it with the unix/linux terminal if you are already familiar with that.
Bayesian Phylogenetic inference using MrBayes consists o the following steps:
cdetc) and enter
mb. If you get the error “Command not found” or the like, you can make an alias to the programme using
alias mb=/Applications/MrBayes\ 3/mb(or add
mbto your path if you know how to do that). See also my crash course in commandline computing
set dir=command, e.g.
help(plus return to tell the console that you’re done), and admire the wall of text that this produces
help executetell you about the required format for the
executecommand. Note that commands may be abbreviated to shortest unambiguous spelling (e.g. “exe” instead of “execute”)
quitif you want to leave the programme and start again.
execute Aslian28.nexin the MrBayes console. If your working directory is not the same as the directory where this file is located then you will have to specify the full path to this file.
Minimally you have to set the model (using
lset) and the priors (using
We’ll set two things, the substitution model and the rates model. The substition model is set by nst, and has three possible values (1, 2 or 6).
Most of these models are designed for nucleotides (A, C, T, and G); only the F81 model is suitable for binary (‘restriction’) data. F81 is parameterised as a single rate for q01, while the rate q10 is derived from the relative frequencies of 0 versus 1 in the data.
The gamma rates model assumes that features can be classified into a certain number of rates classes.::
By default MrBayes uses four rate categories, but if you want a different number you can set it (note that processing time increases proportionally to the number of categories). Here I speed things up by reducing the number of classes to two::
You can set all these things in one go if you want (I’m going to put
ngammacat back to four, because it’s a good compromise)::
lset nst=1 rates=gamma ngammacat=4
For better for worse MrBayes chooses quite workable priors all by itself. But you should really aim to work out what MrBayes is doing, and make informed decisions of your own.
Relevant priors include information about
unconstrainedsetting is best
brlenspr=exponential(100)(‘testing’ here means ‘run analyses with the different settings and see whether it significantly influences the likelihood’).
To review the model settings before the analysis you can enter
You set the parameters of the Monte Carlo Markov Chain search with
mcmcp. The arguments include:
ngen: The number of MCMC generations MrBayes will run before pausing
nruns: The number of independent MCMC analyses run by MrBayes
nchains: The number of Metropolis-coupled chains for each independent MCMC analysis
samplefreq: The frequency with which MrBayes writes output to files
printfreq: The frequency with which MrBayes prints output to the screen
filename: The root name for output file names
mcmcp filename=mytest tells MrBayes to write the output to files with the prefix
Once we’ve got some output to look at we’ll probably want to change
samplefreq, but you can’t decide this ahead of time. You can also set these parameters on the
mcmc command, which launches the Monte Carlo Markov Chain search::
mcmc ngen=1000000 samplefreq=1000
Take a look at how fast the data is scrolling past (rows are numbered) and estimate how long the run will take to complete. You probably have time for a cup of coffee.
When the chain comes to an end (and you are running MrBayes interactively rather than controlling it by a script) the programme gives you the opportunity to continue the search for longer::
Average standard deviation of split frequencies: 0.030479 Continue with analysis? (yes/no):
The rule of thumb here is that if the average standard deviation of split frequencies is less than 0.01 then you’ve probably run the chain long enough. Otherwise enter yes and let it carry on.
The other way to check convergence is to look at the likelihood trace. You can do this by loading the log file into Tracer http://tree.bio.ed.ac.uk/software/tracer/, or if you are an R user, you read it into R and plot the values::
# Read in the the logfile, skipping lines starting with "[" R> logfile <- read.delim("Aslian28.nex.run1.p", comment.char="[") R> plot(logfile$LnL, type="l") # plot the trace as a line # get rid of the first e.g. 500 samples to see how stationary # the run is after the burn-in R> plot(logfile$LnL[-c(1:500)], type="l")
You can then repeat the analysis using other evolutionary models, and compare how they fit in order to select the best model to describe your data.
Did you notice the warning when you read in the data to MrBayes?:
.. WARNING:: There are 69 characters incompatible with the specified coding bias. These characters will be excluded.
Let’s use R to try to work out what’s going on. We’ll read in the data, and look at the values::
R> library(ape) R> d <- read.nexus.data("Aslian28.nex") # convert the data to a data frame R> d <- as.data.frame(d) # transpose rows and columns R> d <- t(d) # get the row names R> lx <- rownames(d) # apply the as.numeric function to the columns (option 2) of d R> d <- apply(d, 2, as.numeric) # put the row names back (I don't know why they disappeared) R> rownames(d) <- lx # apply the sum.numeric function to the columns (option 2) of d # in order to count how many ones there are in each column R> ones <- apply(d, 2, sum) # there are 68 columns with all zero R> table(ones) ones 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... 68 13 147 78 45 39 24 19 18 13 8 12 12 12 6 10
So there are 68 columns with all zero in the database. The warning message refered to 69::
# check how many languages R> nrow(d)  28 # there is one column with all ones! R> table(ones)[c("28")] 28 1
And 68 + 1 = 69. This must be it.
From the MrBayes manual:
A problem with some binary data sets, notably restriction sites, is that there is an ascertainment (coding) bias such that certain characters will always be missing from the observed data. It is impossible, for instance, to detect restriction sites that are absent in all of the studied taxa. MrBayes corrects for this bias by calculating likelihoods conditional on the unobservable characters being absent (Felsenstein, 1992). The ascertainment (coding) bias is selected using
lset coding. There are five options: (1) there is no bias, all types of characters could, in principle, be observed (
lset coding=all); (2) characters that are absent (state 0) in all taxa cannot be observed (
lset coding=noabsencesites); (3) characters that are present (state 1) in all taxa cannot be observed (
lset coding=nopresencesites); (4) characters that are constant (either state 0 or 1) in all taxa cannot be observed (
lset coding=variable); and (5) only characters that are parsimony-informative have been scored (
lset coding=informative). For restriction sites it is typically true that all-absence sites cannot be observed, so the correct coding bias option is
There are lots of ways to look at your results
Go through the tree set and count how many times each two-way split in the data occurs. Rank these according to frequency, and starting from the top of the ranking build a single tree using every split that doesn’t conflict with one that has already been added to the tree.
a NeighborNet splits graph gives a good visual summary of the consensus and conflict within a tree:
Maximum clade credibility tree (using TreeAnnotator, part of the BEAST package)
Go through the tree set and count how many times each two-way split in the data occurs. Treat these as the likelihoods of each split. Now for each tree in the sample, calculate the product of the likelihoods of each of the splits in the tree. The tree which scores highest can be treated as the most representative tree in the sample.
If you do choose to reduce your sample to a single tree, use R or FigTree to make a nice picture out of it.
Reducing your tree sample to a single tree is a convenience for visualization, not a part of the analysis. You should do any calculations that you're interested in on the basis of the complete sample.
Rather than typing in all the MrBayes commands each time you tweak an analysis, it is much more reliable to use a “MrBayes” block in your nexus file. This block starts with
begin mrbayes; and (like any other nexus block) ends with
end;. The MrBayes block can contain any command that you type into the MrBayes console; but don’t forget to end them with a semicolon (the nexus standard — this also allows for more readable, multiline commands. It’s good practice to include comments to yourself explaining what the control file is supposed to do. Nexus comments go in
[ square brackets ]. Here is an example of somewhat elaborate MrBayes block::
[ topology constrained to include South and Central Aslian ] begin mrbayes; log start filename=test.log replace; constraint southaslian = Semaq_Beri_Brw Semaq_Beri_Jaboy Semelai Mah_Meri; constraint centralaslian = Semnam_Bal Semnam_Malau Lanoh_Kertei Temiar_Kelantan Temiar_Perak Semai_Ringlet Semai_Kampar; prset statefreqpr=fixed(empirical) brlenspr=unconstrained:exponential(0.05); prset topologypr = constraints(southaslian,centralaslian); lset nst=1 rates=gamma ngammacat=4 coding=noabsencesites; mcmc ngen=1500000 printfreq=1000 samplefreq=1500 nchains=4 savebrlens=yes nruns = 2 ; sumt burnin=1000; sump; plot filename=Aslian28.nex.run1.p match=all; plot filename=Aslian28.nex.run2.p match=all; log stop; end;
You can then run this file in MrBayes using the command line
mb myfile.nex (assuming that the file with the data in it is called
myfile.nex. If you want to be left with an interactive console once everything has been run, use the
-i flag, i.e.
mb -i myfile.nex. Another neat way to do it is to keep the data in one nexus file (e.g. like the
Aslian28.nex file used in the exercises above, and then make a control file with just the
mrbayes block. Include
execute Aslian28.nex; as the first line, after
begin mrbayes;. This way you can have multiple control files with different settings, but all using the same data file.