III. Loading Character Data into R

InfoInfo
Search:    
Primary Contact(s) Created Required Software
Rich Glor 7 March 2009 [WWW]R
Example Datafile Prerequisites
See Introduction R for Phylogenetics parts I, II

Introduction|I. Getting Started|II. Tree Basics|III. Loading Character Data|IV. Testing Phylogenetic Signal|V. Ancestral Reconstruction|VI. Testing Patterns

Introduction


R is becoming increasingly important to implementation of phylogenetic comparative analyses. Many new methods are developed specifically for the R platform while many older methods are being translated into R packages. We’re going to tackle comparative analyses in R in several stages. Through the course of this exercise we’ll learn how to (1) load comparative data into R, (2) ensure that our comparative data and tree are compatible, (3) conduct basic diagnostic tests of phylogenetic signal, and (4) conduct a variety of standard phylogenetic comparative analyses.

A. Loading trait data into R


If you have some character data for the species in your phylogeny, you can easily upload and analyze this data in R. Perhaps the easiest way to get your data into R is to start with a simple comma delimited table. You can generate such a table in EXCEL by saving your EXCEL worksheet as a .csv file using the Save As function under the File menu in EXCEL.

I've provided you with a comma-delimited table of data called [WWW]anolis_data.csv. This file contains ecological and morphological data for Greater Antillean Anolis lizards. This file includes three columns: the first contains species names, the second includes discretely coded information about anole microhabitat specialization (0=grass-bush, 1=trunk-ground, 2=trunk, 3=trunk-crown, 4=crown-giant, 5=twig), and the third contains continuously-coded data on adult body size of adult males (measured as the snout to vent length, or SVL).

1. Start by opening the anolis_data.csv file in EXCEL to see what you’re dealing with.

2. Now let’s open this same file in R using the read.csv function included in R’s base installation:

     >read.csv(“anolis_data.csv”) -> anolis.dat

You can see if all has gone well by typing anolis.dat and hitting return. You should see your data scrolling across the screen.

             >data.frame(anolis.dat[,2:3]) -> anolisData

B. Checking overlap of phylogenetic and trait datasets


Once you have a tree and associated data in R you're well positioned to conduct a phylogenetic comparative analysis. Before doing this however we need to make sure that we have overlapping phylogenetic and character data. The R package Geiger includes a function called name.check that makes it relatively easy to (1) check for overlap between the taxa in a phylogenetic trees and the taxa for which you have some other sort of character data, and (2) prune taxa missing from your tree or trait data in subsequent analyses.

1. Let’s consider the two datasets from Anolis lizards that we’ve already loaded into R, which include a phylogenetic tree (anolisChronogram - this tree was made during Part II of this exercise, so go back and complete this exercise if you don't still have this tree stored in R) and data on microhabitat specialization and body size (anolisData). To determine which taxa are overlapping in these two files, type:

     >name.check(anolisChronogram, anolisData)

The output should consist of two lists of taxon names, one called $Tree.not.data and another called $Data.not.tree. In our case, the fact that numerous names are present in the $Tree.not.data list means that some anole species are represented in our tree, but not in our set of character data (the reason for this is that our tree was reconstructed using anoles from across the genus, whereas character data was only available for species from the Greater Antilles whose microhabitat specialization has been characterized). By contrast, the $Data.not.tree list should be empty (indicated by character(0)), suggesting that there are no taxa in our comparative dataset that are absent from the tree.

2. In order to conduct phylogenetic comparative analyses, we want to prune all the taxa from our tree that are not also present among our trait data. To do this we’re going to save the two lists generated by name.check under the name anolisOverlap and then use the names in the $Tree.not.data portion of this object in an application of the drop.tip function:

    >name.check(anolisChronogram, anolisData) -> anolisOverlap 
>drop.tip(anolisChronogram, anolisOverlap$Tree.not.data) -> anolisComparativeTree

Take a look at your newly pruned tree by typing plot(anolisComparativeTree).

C. Visualizing Trait Data on a Phylogeny

1. We often would like to plot the data for our comparative analysis across the tips of our tree. With a bit of practice, this can be done fairly easily in R.

Once the tree is replotted, we're ready to add the points.

It is important to realize that microLabel[anolisComparativeTree$tip.label] is crucial as it sorts the microLabel to match the the tip labels on the tree!

NOTE: The points command will only work if you already have a plotting window open. This is because points is trying to add points to an already existing figure. Go back and repeat the plotting function implemented in step 1a if you closed your Quartz window between then and now.


Continue to part IV: Testing Phylogenetic Signal in R
This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organization that helps communities collaborate via wikis.