Parsimony (PAUP*)

InfoInfo
Search:    
Primary Contact(s) Created Required Software Example Datafile
Greg Pauly 8 March 2009 [WWW]PAUP (not free); Text editor: [WWW]TextWrangler (for Macs) or [WWW]TextPad (for PCs) primatespaup.nex
Presentation Slides
Powerpoint: PaupGarli.ppt
PDF: PaupGarlipowerpoint.pdf

Introduction


Parsimony is one of the most widely used optimization methods for phylogenetic tree reconstruction. For this tutorial, we will be reconstructing trees using parsimony with PAUP*. For decades, PAUP* was the gold standard for phylogenetic tree reconstruction software. In addition to parsimony, PAUP* is capable of reconstructing trees using distance-based and maximum likelihood methods. PAUP* also includes many useful tools for examining and visualizing trees. This said, PAUP* hasn't been updated for years and many of its applications are no longer state of the art.

In this exercise, we cover the basics of analyzing data using parsimony in PAUP*, as well as some simple commands for examining resulting tree files. For more information about PAUP*'s capabilities, see the PAUP* manual: [[File(Paupmanual.pdf). The manual includes descriptions of the available commands and the numerous options. It also explains the necessary syntax for operating the program and numerous examples that will help make it clear how to use PAUP.

Tutorial


Installation

1. Place PAUP and the example nexus file in a folder named "paup" on your desktop...

2. To run the the program you must first open the terminal window and navigate to paup folder on your desktop. If you're on a Mac running OSX, this can generally be done by typing:

>cd Desktop/paup

Check to make sure the software runs properly by typing ./paup. If you see a few lines of text introducing PAUP the program is running OK and you are ready to proceed.

Parsimony Heurisitic Search

1. If you haven't already done so, start PAUP* by typing:

./paup

This command should result in the PAUP prompt - paup> from which you can enter subsequent commands.

2. Execute the sample data file called primatespaup.nex by typing:

> exec primatespaup.nex;

If all goes well, PAUP* should print details about the data matrix to the screen:

Data matrix has 12 taxa, 898 characters.

3. Now is a good time to start a log file. This log file will be used to store all of the output that PAUP* presents in your Terminal window. This is a very helpful option to keep track of your analysis, especially when referencing the analysis much later.

> log file=primateshsearch.log;

4. Before running the heuristic search, we need to consider some of the options for our overall search settings. To view these options, we can use the help functions of PAUP that are called with the use of a question mark following the command of interest. As you run PAUP, you will likely use this routinely to figure out command syntax and options. If you need more detail on these options refer to the PAUP manual.

> set ?;

After typing set ?, you should see options that permit you to change the optimality criterion, the rooting method and parameters, and the maximum number of trees saved in a search. These are all settings you will change regularly. For this exerise, we're going to run a heuristic search under the parsimony criterion. In many cases, you'll want to be able to store more than 100 trees from your heuristic search. To tell PAUP it's OK to increase the number of stored trees as necessary type:

>set increase=auto;

5. Now let's look at the options for a heuristic search.

> hsearch ?;

6. Before running a heuristic analysis, we need to consider how to build the starting tree, how we will swap on this tree, and how many times we want to build a new tree. All of these considerations are important if we hope to avoid recovery of non-optimal topologies during our heuristic search (recall that the optimal tree is not a gauranteed outcome when searching for the most parsimonious tree with heuristic methods). In this exercise, we will use a random stepwise addition sequence with TBR branch swapping and 20 replicates of the random addition sequence.

>hsearch start=stepwise addseq=random nreps=20 swap=TBR;

7. PAUP* should display a summary of the search along with a table of the results.

8. To look at the tree(s), type:

> showtree all;

To view just one of two or more stored trees, just follow this command with the name of the tree you'd like to see. To look at tree 1 type:

> showtree 1;

How does our tree look?

9. If you are familiar with primates, the rooting on your current tree may look somewhat strange. This is because the tree was rooted by default with the first taxon in the dataset. To change our root to the correct location we want to define Lemur_catta as the outgroup. We know that Lemur is the outgroup to the remaining 11 taxa based on other lines of evidence. If we knew nothing about primates, we would have to find a different taxon that we know is outside of the ingroup for our outgroup rooting (e.g., a rodent).

> outgroup Lemur_catta;

10. How does our tree look now? The root looks a bit strange. We need to set the rooting parameter. Root the trees so that the outgroup taxa are monophyletic with respect to the ingroup.

> set outroot=mono;

11. Now let's look at tree 1 with the correct root.

> showtree 1;

12. Now let's incorporate branch length information by describing the optimal tree as a phylogram (a tree with parsimony branch lengths).

> describe /plot=phylo;

You may notice that the phylogram looks a bit funny. This is because we changed the outgroup after the search was completed. A quick way around this is to change the rootmethod by typing:

> set rootmethod=mid;

To look at the tree with mid-point rooting type:

> describe /plot=phylo;
> set rootmethod=out;
> describe /plot=phylo;

Note the forward slash (/); this is important for many commands. Values before the slash indicate which trees you wish to include, options that apply to the primary command (contree) are placed after the slash

13. Save the trees to file.

> savetree file=Primate.tre brlen=yes;

14. We recovered two trees, but we don't know how these two trees differ. To find this out, compute a strict consensus of your best trees.

> contree all /strict=yes show=yes;

15. We have now finished our heuristic search. So now, let's clear the trees from memory, stop our previous log file, and reset the defaults for the Heuristic search.

> cleartrees;
> log stop;
> defaults hsearch;

Conducting a Parsimony Bootstrap Search in PAUP*

1. Start PAUP*

./paup

2. Execute the example data file.

> exec primatespaup.nex;

3. Start a log file.

> log file=primateBoot.log;

4. Set the rooting. (If you didn't quit after the previous exercise, the outgroup may already be specified.)

> outgroup Lemur_catta; 
> set outroot=mono;

Here we are setting the outgroup much earlier in this analysis than we did in the previous analysis. This is the appropriate sequence: you should get in the habit of setting the outgroup before you conduct your analyses. This will save you from some hassles when saving trees and viewing them in other programs, such as FigTree.

5. To run a bootstrap search we need to specify details of the bootstrap search and the heuristic search of each bootstrap pseudoreplicate. The set-up should look like this:

>bootstrap [bootstrap options] [/hsearch options]

Here, the forward slash separates the bootstrap options from the heuristic search options for each bootstrap pseudoreplicate.

> bootstrap nreps=200 treefile=boot.tre search=heuristic/ start=stepwise addseq=random nreps=10 swap=TBR;

6. If you want to save a tree that has the bootstrap proportions on it, you need to save this tree immediately after the bootstrap analysis is complete by typing:

> savetrees file=bootMajRule.tree from=1 to=1 savebootp=nodelabels;

In this command, it is necessary to specify the tree you'd like to save using the from/to commands, even if there is only one tree in memory. The savebootp=nodelabels is also an important command. This will save the bootstrap proportions as nodelabels. The tree can then be opened in other programs (e.g., FigTree) that will permit visualization of the node labels.

7. We might also want to summarize our bootstrap proportions after the analysis was completed. You might do this if you were combining tree files from multiple analyses run on different computers. So let's make a new consensus tree from our tree file. Open the treefile by typing:

> gettrees file=boot.tre StoreTreeWts=yes mode=3;

There are two important points here: tree weights and mode option for importing trees. Look at your tree file in a text editor. Note that multiple trees were found for some bootstrap pseudoreplicates. We don't want to count each of these as independent but weight them by the inverse of the number of trees found for that replicate. This is important for bootstrap and jackknife replicates.

Here's the description from the PAUP manual (p. 46): "Trees output to a treefile in bootstrap and jackknife contain a weight comment (see page 25). These weights are the reciprocal of the described on page 67 will optionally store these weights, and the majority-rule consensus calculator will use them when USETREEWTS= YES. This allows calculation of majority-rule consensus trees that correspond exactly to the consensus tree output by the original bootstrap or jackknife command. This option allows the combination of bootstrap results from runs performed at different times or on different machines and the recovery of results obtained prior to a system crash."

Mode option refers to how the imported trees and the trees existing in memory are treated by PAUP*.
mode = 1 Keep only the trees from the file that are not already in memory
mode = 2 Keep only the trees from the file that are also in memory
mode = 3 (DEFAULT) Replace all of the trees currently in memory with those imported from file
mode = 4 Keep the trees in memory that are not also in the file
mode = 5 Keep the trees that are either currently in memory or in the file, but not in both
mode = 7 Append trees from file to the trees currently in memory
[There is no mode = 6]

8. Now let's make a majority rule consensus tree and save this tree (see step 6).

> contree all/strict=no majrule=yes usetreewts=yes treefile=bootMajRule.tre;

9. Quit PAUP*.

> quit;

Using PAUP Blocks

Instead of doing everything via command line as we have in the previous two analyses, an alternative way to run this search is to put the necessary instructions in a PAUP block in your original nexus file. These blocks are very useful for several reasons. One benefit is that your nexus file will serve as a permanent record of what you did. You will then have a single file that includes the data matrix and the analysis parameters; this is extremely helpful if long after the analysis you need to look up details of what you did months before. Further, using a single file will reduce the introduction of error that can occur when using the command line.

Let's look at our nexus file in a text editor and set up the same parsimony heurstic search we set up above (technically, this search will be slightly different from the one above because we'll exclude some characters).

Open the primatespaup.nex file in a text editor. Note that beneath the data matrix, there is character/taxon set block and a PAUP block. Currently, these blocks are commented out by the use of brackets ([]). The blocks look like this:

BEGIN SETS; 
CHARSET  beginning  =  1-12;
CHARSET  end  =  895-898;
TAXSET  outgroups  =  Lemur_catta;
END;

BEGIN PAUP;
exclude beginning end;
outgroup outgroups;
set increase=auto  autoclose=yes;
hsearch start=stepwise addseq=random nreps=20 swap=TBR;
END;

The top block defines two character sets; one includes 12 characters and the other includes 4 characters. A taxon set is also termed "outgroups" and includes one taxon.

In the PAUP block, the two character sets are excluded (characters deleted from the analysis) and the outgroup for the analysis is defined as the taxon set termed "outgroups" which includes a single taxon. Lastly, in this block, we also initiate a heuristic search as we did earlier using the command line. (In the previous analysis, we did not exclude any characters as we have here; so this analysis isn't completely identical to the previous one. Thus, you will get a different tree score.)

Delete the open and closing brackets and save this file. Now execute this file in PAUP and a heuristic search will be run.

1. Start PAUP*

./paup

(If you want, you can also start a log file now.)

2. Execute the data file.

> exec primatespaup.nex;

Note that the output is very similar to the output generated earlier when we did the search using the command line. It would be completely identical if we deleted the command to exclude the character sets named "beginning" and "end".

3. Quit PAUP*.

> quit;

This is a Wiki Spot wiki. Wiki Spot is a 501(c)3 non-profit organization that helps communities collaborate via wikis.