3. Editing XML Input File

InfoInfo
Search:    

Back: Part 2 Tutorial Home Next: Part 4

Created by [WWW]Brian Moore

Introduction

BEAUti is very helpful for translating standard NEXUS formatted files to the xml format read by BEAST, and facilitates specification of input files for relatively simple analyses. However, to realize the full potential of BEAST, we will ultimately need to become familiar with the xml file format. Here, we describe how to modify the xml file in a text editor to apply a fossil calibration to an internal node of the phylogeny.

This entails 5 main steps: (1) defining the clade to which the fossil has been assigned, (2) monitoring and (3) enforcing the monophyly of that clade, (4) monitoring the age of that clade, and (5) assigning a prior probability density to describe the age of the fossil.

Open the 'base' xml file (previously generated with BEAUti) in a text editor; in the following tutorial we used [WWW]TextWrangler. Both the base xml file and the manually edited xml file are included in the example data files for the tutorial: DTE Input Files.zip. Note that the screen captures below include the line numbers in the edited xml file to help you identify the location of the corresponding edits to the xml file.

Step 1: Defining a clade

The fossil that we will use to calibrate divergence time estimates for the genus Platanus has been assigned to a subclade comprising four species, P. occidentalis, P. occidentalis, P. orientalis, and P. racemosa. Previous Bayesian analyses using MrBayes indicate strong support for this clade (i.e., Pr >0.99). So, the first thing we need to do is define a 'taxa' element that includes these four species, which we designate 'clade_1'. This is achieved by adding the xml element depicted below:

Step 2: Monitoring the monophyly of the specified clade

We want to monitor the monophyly of the clade that we defined above. This requires adding a 'monophylyStatistic' element to the xml file that we will designate 'clade1_Monophyly', which monitors the status of the MRCA of 'clade_1' during the MCMC analysis. This is achieved using the xml element below:

Step 3: Monitoring the age of the specified clade

Next, we want to monitor age of the MRCA of 'clade_1' so that we can impose a calibration prior to this node in the phylogeny. This involves adding a 'tmrcaStatistic' element to the xml file that we will designate 'clade1_DTE', which monitors the age of the MRCA of 'clade_1' during the MCMC analysis. This is achieved via the xml element depicted below:

Step 4: Enforcing the monophyly of the specified clade

Because we wish to apply our fossil calibration to the MRCA of 'clade_1', it is convenient to constrain the monophyly of this clade. This involves two steps. First, we define a boolean test for the monophyly of 'clade_1', which returns a value of zero when its argument is false. Next, we return the result of this test in the 'prior' element of the xml file. This ensures that any proposed change to the topology that violates the test will have zero likelihood and so will not be accepted during the MCMC.

  1. The test for the monophyly of 'clade_1' invokes a 'booleanLikelihood' element that returns a value of 1 when its argument is true and a value of 0 when it is false. Specifically, we designate a 'booleanLikelihood' statistic, which we designate 'clade1_constraint', that has as its argument the monophyly statistic, 'clade1_monophyly'. This is achieved by means of the xml element depicted below:

  2. The result of the boolean test for the monophyly of 'clade_1', 'clade1_constraint', is returned in the 'prior' element of the xml file. This is achieved via the xml element depicted below:

Step 5: Calibrating the age of the specified clade

Now that we have defined and constrained the monophyly of 'clade_1', we are finally ready to apply a fossil constraint to this group of species. Formally, this involves explicit specification of our prior beliefs about the age of the MRCA of this clade as a prior probability density (i.e., a 'calibration prior'). BEAST is very flexible in this regard, as it can accommodate a variety of probability distributions (e.g., uniform, normal, lognormal, exponential) to reflect various calibration priors. Although it is technically straightforward to define a calibration prior, this aspect of the analysis is often the most difficult in practice, and should be considered carefully.

In our example data set, the fossil previously assigned to 'clade_1' has been estimated to be ~45My of age, which defines the minimum age for this clade. Moreover, there is evidence that the stem age of the group to which Platanus belongs (the 'plane-tree' family Platanaceae) is no more than ~65My of age, which provides a maximum age constraint for 'clade_1'. There is little additional fossil information to inform a calibration prior for this group, so we will adopt a fairly vague calibration prior to reflect this uncertainty, and constrain the MRCA of clade_1 (which is monitored by the previously defined statistic, {{'clade_1_DTE'}}} to a uniform calibration prior bounded between 45 to 65 MA. This is achieved via the xml element depicted below:

Almost (but not quite) ready to roll...

We've just completed a series of modifications to our XML file that defines the clade to which the fossil belongs, monitors and enforces the monophyly of that clade, and monitors and constrains the age of this clade. Are we finally ready to run the BEAST analysis? Not quite yet. Attempting to do so will cause BEAST to generate the following error message:

What gives? As suggested by the error message, BEAST is attempting to initiate the MCMC from a randomly generated starting tree that violates the temporal and/or topological constraints that we have just defined, which is described in the error message as an ‘initial state is incompatible with one or more 'hard' constraints (on monophyly or bounds on parameter values)’. Moreover, the error message specifically identifies the culprit—‘clade1_constraint=-Inf’—indicating that the 'clade1_constraint' has been violated and caused the initial state of the chain to have a likelihood of negative infinity. The solution to this problem is to specify a starting tree that satisfies the temporal and topological constraints that we have imposed, which we will describe in the next part of the tutorial.

This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organization that helps communities collaborate via wikis.