|Primary Contact(s)||Created||Required Software|
|Rich Glor||9 March 2009||R|
|See Introduction||R for Phylogenetics parts I, II, and III|
Phylogenetic signal - which is recognized when closely related species tend to be more similar to one another than expected by chance - is a central concept in comparative biology. Phylogenetic signal is the reason that phylogenetic comparative methods are required in the first place: if related species weren’t more similar than expected by chance, methods that account for the expected non-independence would be unnecessary. For this reason, conducting a test of phylogenetic signal is a key pre-requisite for studies of character evolution. If phylogenetic signal is low, or even absent entirely, our ability to infer patterns of character evolution will be limited. In extreme cases, the absence of phylogenetic signal may be used as justification for reliance on standard statistical analyses that avoid the annoyance of incorporating the phylogeny.
A variety of methods have been proposed as tests of phylogenetic signal. Methods are also available for analyses of both quantitative (continuously-coded) and qualitative (discretely-coded) characters. We will focus here on maximum likelihood based methods,. Two ML-based methods are implemented in R. One uses a clever short-cut to ask whether the phylogeny helps explain the distribution of character states among extant taxa. The second uses simulations of evolution under Brownian motion to ask whether the observed similarity of taxa is more similar than expected.
A. Testing phylogenetic signal using Pagel’s Lambda
1. If a character exhibits phylogenetic signal, we expect that the phylogeny will be helpful in explaining the distribution of character values among terminal taxa. Pagel introduced a simple test statistic that can be used to test this prediction. This test statistic is actually a tree transformation parameter that has the effect of gradually eliminating phylogenetic structure. Lambda does this by multiplying the off-diagonal elements of the variance/covariance matrix describing your tree topology and branch lengths by values between 0 and 1. Lambda values of 1 correspond with the original, untransformed branch lengths, whereas at the other extreme a lambda value of 0 corresponds with the complete absence of phylogenetic structure. Let's take a look at what Lambda is doing by applying this transformation parameter to our tree using the lambdaTree function of geiger. Let's look at trees with Lambda=0.5 and Lambda=0 using the following commands:
lambdaTree(anolisComparativeTree, 0) -> anolisLambda0
lambdaTree(anolisComparativeTree, 0.5) -> anolisLambda0.5
2. Now try plotting all three trees: your original tree, your tree with lambda=0.5 and your tree with lambda=0. To look at all three of these trees at the same time, we're going to use the multiple plotting function of R called mfcol. mfcol works by dividing you plotting area into a table with multiple rows and/or columns into which plots can be placed. By typing par(mfcol=c(1,3)) we can generate a plotting area with one row and three columns.
par(mfcol=c(1,3)) plot(anolisComparativeTree) plot(anolisLambda0.5) plot(anolisLambda0)
You should obtain trees that look like this, with the original tree on the left, the tree made with lambda=0.5 in the middle and the tree with lambda=0 on the right:
You can see clearly now what it means to eliminate phylogenetic structure using Lambda.
3. Now that we have a basic understanding of what Lambda is actually doing, we can use maximum likelihood optimization to investigate the degree to which our trait exhibits phylognetic signal. One basic way to do this is to optimize the value of Lambda using ML. We can do this using the fitDiscrete function of Geiger.
fitDiscrete(anolisComparativeTree, micro, treeTransform="lambda")
After you input this command, you should see a scale bar ticking down the time for this analysis. The reason this is happening is because the analysis is actually be repeated several times to ensure that your ML search isn't being trapped on a non-optimal peak. Ultimately you should get some output that looks something like this:
Finding the maximum likelihood solution [0 50 100] [....................] $Trait1 $Trait1$lnl  -86.76405 $Trait1$q [,1] [1,] -0.03717188 $Trait1$treeParam  0.9943398 $Trait1$message  "R thinks that this is the right answer."
We're particularly interested in two numbers in this file. First, the value listed under $Trait1$treeParam is the value of Lamda estimated by ML. As you can see here, this value is very close to 1, indicating that our ML solution is one with phylogenetic signal. We're also interested in the value listed under $Trait1$lnl. This is the negative log likelihood for this particular scenario. We can use this value to conduct likelihood ratio tests of alternative scenarios.
4. One basic question that we're interested in addressing is whether significant phylogenetic signal exists in our dataset. The easiest way to test this hypothesis is to obtain the negative log likelihood from a tree without phylogenetic signal and compare this value to that obtained from the original topology. To get a likelihood value for our trait on the tree without signal, we can type the following:
This analysis should produce a ML score of -149.8383. To compare this score to that of the tree with phylogenetic signal, we can use the Akaike Information Criterion or the likelihood ratio test.
5. Try doing the same Lambda-based analyses you just did with microhabitat using SVL.
B. Testing phylogenetic signal using Blomberg et al.'s K
1. To calculate K, you can use the Kcalc command of the picante package:
One key to getting this to work properly is that you need to have your taxa properly aligned. Specifically, the taxa in your character dataset need to be aligned in the same sequence as the tips of your tree. To do this, we need to resort the list of our SVL data. We can sort the SVL data by having a simple sorting command after our SVL vector. This is why anolisComparativeTree$tip.label follows SVL.
2. To ask whether significant signal exists, we can use a related metric and need some simulations. Here, we are interested in looking at the observed and expected variance in independent contrasts calculated at nodes in our tree. This is implemented using the phylosignal function of picante