II. Tree Basics

InfoInfo
Search:    
Primary Contact(s) Created Required Software
Rich Glor 7 March 2009 [WWW]R
Example Datafile Prerequisites
See Introduction R for Phylogenetics parts I

Introduction|I. Getting Started|II. Tree Basics|III. Loading Character Data|IV. Testing Phylogenetic Signal|V. Ancestral Reconstruction|VI. Testing Patterns

Introduction


Now that you have the packages we need installed and a basic understanding of how to get help interactively, you're ready to start looking at trees. R is capable of dealing with many types of trees (e.g., NEWICK, NEXUS, PHYLIP). It’s also capable of doing a variety of tree conversions and writing/exporting trees in most commonly used formats. Our goals in this exercise are to (1) open and view NEXUS formatted phylogenetic trees in R, and (2) understand how these trees are stored in R. You will need to use the file [WWW]anolis_mtDNA.mrb.con.

A. Opening, Storing & Viewing Trees


1. To load a basic NEWICK formatted tree or trees (e.g., (1,2),3)), you can use the read.tree command. Because most widely used phylogenetic reconstruction methods (e.g., PAUP, MrBayes) produce trees in the slightly more complicated NEXUS format, however, we’re going to learn to use a variant of the basic read.tree command called read.nexus. We’re going to practice using this command by uploading the trees contained in the .con file that is output when you use the sumt command in MrBayes to generate a consensus of your post-burning topologies. Recall that the .con file generated by MrBayes contains two trees, one with posterior probability values and one without. Fortunately, uploading a NEXUS file containing two trees isn’t a problem for read.nexus. To use this command, we simply type the name of the function followed by the name of the tree file we’d like to import into R enclosed in parentheses and quotes:

This command should save the trees in your .con file to the object called anolisTrees without producing any on-screen output and quickly give you another R prompt. More likely, however, you ended up with a message saying:

    Error in file(file, "r") : cannot open the connection 
        In addition: Warning message:
        In file(file, "r") :
                cannot open file 'anolis_mtDNA': No such file or directory

You’re getting this error because you haven’t yet told R the location of the file you’d like to open. R doesn’t search your entire hard drive for a particular file, it only searches in the folder that is active as its working directory. Before you can load your tree, you need to tell R which directory you’ll be working from. To do this you simply go to the ‘Misc’ menu, select ‘Change working directory...’, and choose the desired location (For this to work you need to actually click open the folder you’d like to be working in before clicking the Open button in R: if you just click on the name of the folder you’d like to work from and click Open you may end up in the directory above the one you’d actually like to be in). After you’ve selected the appropriate directory, try running the read.nexus line above a second time. If you get another R prompt after hitting return, you’re ready to move on to step 2.

2. To see if things worked and to begin exploring how R stores trees, type the name of the object storing the trees that you just imported:

You should get a line telling you that 2 trees have been uploaded:

3. Let’s keep things simple for the time being by focusing on only one of the two trees we just uploaded to R. To isolate the first tree in your original .con file (the one with the posterior probability values) and save it to a new object in R simply type:

The double square brackets around the number 1 that follows anolisTrees is being used to tell R that you only want the first element of the two-part object, or list, called anolisTrees. (we’ll talk more later about how square brackets are used to specify elements of R objects and lists). By using square brackets and associated assignment arrow, we generate a new object called anolisTree that contains only the one tree that we want to work with. Now try typing anolisTree and hit return. You should see a few lines telling you how many tips and internal nodes your tree has, a partial list of tip labels, node labels, and a comment telling you that you have an unrooted tree with branch lengths:

    Phylogenetic tree with 189 tips and 175 internal nodes. 
        Tip labels:
                Bplumifrons, Pacutirostris, heteroderma, inderanae, nicefori, aeneus, ...
                Node labels:
                , 0.99, 1.00, 0.81, 1.00, 1.00,...

        Unrooted; includes branch lengths.

4. Now the exciting part; take a look at your tree graphically. To look at the original two tree file you can type:

The general plotting function of R (plot) should automatically recognize your object anolisTree as a tree file and plot it accordingly in a new window labeled ‘Quartz’.
In this case you will be prompted to Hit <Return> to see next plot: because there are two trees included in the standard .con file produced by MrBayes. To look at the single tree, you can type:

5. There are lots of options for tree plotting in R; indeed, this is one of R’s greatest strengths. Let’s begin with the basics by changing the size of the font so that our taxon names are a bit more visible. This can be done using a general function in R called cex, which is used to scale the size of graphical elements (values of cex less than 1 will result in text that is smaller than the default, whereas values greater than 1 will result in text that is larger than the default):

Getting accustomed to the use of cex will take some trial and error. On my monitor a value of 0.5 results a nicely readable image, but this may not be the case with your monitor. Try experimenting with the value of cex to get an image that looks good on your screen.

6. Let’s add one more feature to our tree - the posterior probability values assigned by MrBayes to each node. The simplest way to do this is to include an extra command in our original plotting function:

Because it can be hard to read these text values, let's try putting symbols on nodes that reflect support. To do this, we're going to convert posterior probability values into colors and then used those colors to paint circles placed at each node. Let's begin by taking a look at our nodal support values:

Now we're going to create a new matrix and use this matrix to convert the support values into colors. We're going to use black to indicate nodes with high support (pp > 0.95), gray to indicate nodes with moderate support (0.75 < pp < 0.95), and white to indicate nodes with poor support (pp < 0.75).

You should now have a matrix of colors that reflects your posterior probability values. You can confirm this by looking at both matrices:

If these look accurate, you're ready to throw some circles with support values onto your tree. To to this you need to plot the tree and then add the support values.

plot(anolisTree) 
nodelabels(pch=21, cex=0.5, bg=p)

B. Pruning Taxa from a Phylogenetic Trees


I’ve always found pruning taxa from trees to be one of the most annoying tasks in my own phylogenetic comparative analyses: R makes this task relatively painless. One of the best features of R’s tree-pruning method is that it can be easily implemented across a large sample of trees, making it possible to quickly prune all the trees in a large sample of trees in a Bayesian posterior sample, for example. For now, though, we’re just going to work with the single tree we stored above as anolisTree.

1. To prune terminal taxa we need to use the drop.tip function of Ape. Let’s use this function to delete one of the outgroup taxa from our previously stored tree for Anolis lizards (anolisTree). To do this, we type the name of the function, followed by parentheses bracketing the name of the tree we’d like to prune and the name of the species we’d like to prune enclosed in quotes. The following command will prune the species Pacutirostris from the tree anolisTree and store the results to a new file called anolisTree_noPacutirostris.

     >drop.tip(anolisTree, “Pacutirostris”) -> anolisTree_noPacutirostris

2. Now let’s try deleting multiple taxa by taking advantage of R’s combine function (c). Combine is used to identify and bind together a string of variables. In the lines below, we’re going to go return to our original tree (anolisTree), simultaneously delete two outgroup taxa (Pacutirostris and Bplumifrons) and store the resulting tree to an object called anolisTree_noOutGroups.

     >drop.tip(anolisTree, c("Pacutirostris", "Bplumifrons")) ->  anolisTree_noOutGroups

Try plotting your newly pruned tree to see if your work was successful.

C. Understanding How Trees are Built in R


Before going any further, we need to start learning how R stores and builds trees. This is absolutely essential if we’re going to learn how to take full advantage of R and eventually write our own functions to exploit R’s potential. Instead of storing each tree as a single line of text (as is the case with NEWICK formatted trees), R stores trees as objects consisting of several distinct components. Multipart objects stored in this manner in R are referred to as ‘lists.‘ There are several different ways of storing trees in R. So far, we’ve been working with the most widely-used format, which is called the ‘phylo’ format. This tree format was created by Paradis for his Ape package. If we type ‘help(read.nexus)’ we can see the components that make up a ‘phylo’ formatted tree file R - they’re listed under the Value heading: edge, edge.length, tip.label, node.label, and root.edge.

1. Because we’ve already seen what happens when we type anolisTree, let’s try looking in more detail at one specific component of our phylo-formatted file by typing:

     >anolisTree$tip.label

The $ after anolisTree is used by R to specify a specific element of a list. In this case, we’re using the $ to tell R that we only want to see the element of the anolisTree list that contains the names of the tip taxa. After typing this command, you should be given a list of the names of the species included in your tree file.

2.The edge component of a phylo-formatted tree describes the tree’s topology in a series of lines. View this component of your tree by typing:

     >anolisTree$edge

Each of the rows in the resulting output indicates an edge (or branch) in your tree. Edges are defined by the tips or nodes they connect. In order to do define edges, R first assigns each tip and node in your tree a unique integer. The list produced by anolisTree$edge tells you which numbers (tips or nodes) are connected by each edge.

How do you figure out which numbers correspond with specific tips and nodes in your actual tree? The numbers assigned to tips are indicated in the tip.label component of a tree file and can be obtained by typing anolisTree$tip.label. Numbering of internal nodes is somewhat less intuitive, but if you’d like to see which nodes correspond with which numbers you can type nodelabels() after plotting your tree:

     >plot(anolisTree)
     >nodelabels()

The other components of a tree file (Edge.length, Nnode, and Node.label) are bit more intuitive than edge. Edge.length simply tells R how long each edge defined in the edge list is. Nnode simply tells you the number of internal nodes. The other two components - Node.label and root.edge - are optional. Node.label is used to store posterior probabilities in the tree we’re dealing with.

D. Exporting Trees from R


The tree format used to store trees in R is convenient for many internal R applications, but cannot be interpreted by most other phylogenetic software. For this reason, we need to learn how to translate R trees back into trees that can be read and used by other applications.

To get NEWICK formatted trees, use the write.tree function of Ape:

     >write.tree(anolisTree, file=”anolisTree.newick”)

The resulting tree will be saved to your current working directory with the name anolisTree.newick.

To get NEXUS formatted trees, use the write.nexus function of Ape:

     >write.nexus(anolisTree, file=”anolisTree.nex”)

E. Ultrametric Trees in R


For most analyses of character evolution, we're interested in using ultrametric trees where branch lengths correspond with time (either relative or absolute) rather than the amount of change inferred by MrBayes. The R package Ape offers relatively limited options for rendering trees ultrametric. One of these options is to render an existing tree ultrametric using Sanderson's semi-parametric penalized likelihood approach. You can implement this method using the chronopl function. This function requires that you provide a user-specified value for the lambda, which is the smoothing parameter for the penalized likelihood function. We're going to generate a chronogram using penalized likelihood with an arbitrary lambda value of 0.1. We're going to apply this function to the tree from which outgroups have been deleted because the long branches leading to the outgroup taxa can have an undesirably strong effect on the penalized likelihood function.

chronopl(anolisTree_noOutGroups, lambda=0.1) -> anolisTree_noOuts_PL

To confirm that we actually recovered a chronogram we can plot our original MrBayes phylogram next to our new chronogram in a new plotting window. To look at these two trees at the same time, we're going to use the multiple plotting function of R called mfcol. mfcol works by dividing you plotting area into a table with multiple rows and/or columns into which plots can be placed. By typing par(mfcol=c(1,2)) we can generate a plotting area with one row and three columns.

par(mfcol=c(1,2)) 
plot(anolisTree_noOutGroups, cex=0.25)
plot(anolisTree_noOuts_PL, cex=0.25)

Although trees generated by the penalized likelihood function might be useful as a quick and dirty approximation of a chronogram, this method is not longer the state of the art approach for generating ultrametric trees. Modern Bayesian methods that work directly from aligned sequences generally offer superior performance and are nearly always preferred for modern analyses of character evolution (you are likely to encounter problems with reviewers if you try to use trees generated using PL or other outdated methods for generating ultrametric trees when aligned sequence data is available). For this reason, we're going to upload a new tree that we previously generated using the Bayesian divergence time estimation program BEAST.

     >read.tree("anolis_mtDNA_chronogram.tre") -> anolisChronogram


Continue to part III: Loading Character Data into R
This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organization that helps communities collaborate via wikis.