|Primary Contact(s)||Created||Required Software|
|Greg Pauly||8 March 2009||GARLi v0.951 graphical for Mac, TextWrangler (for Macs) or TextPad (for PCs)|
Maximum likelihood is one of the most widely used optimization methods for phylogenetic tree reconstruction. Unfortunately, full maximum likelihood analyses of many datasets are extremely computationally expensive. As a result, several software packages have been developed to conduct rapid maximum likelihood analyses. Two common programs used for this are GARLI and RAxML. For this tutorial, we will be reconstructing trees using the graphical version of Garli v0.951; the graphical version of GARLI v0.951 only works on Macintosh. If you are running a different platform or want to upgrade to GARLI 0.96, which has a number of improvements over 0.951 but does not have a graphical interface, then you need to download Garli v0.96. A tutorial for GARLI 0.96 is also available on this site. For this tutorial, you need GARLI 0.951 and a text editor. If you don't already have these, follow the links above to download Garli and TextWrangler (Mac) or TextPad (Windows).
Introduction: What is GARLI?
Here's a short algorithm description from page 2 of the GARLI Manual (Zwickl, 2008).
"GARLI is loosely based on the program GAML, by Paul O. Lewis (1998). It uses a stochastic genetic algorithm-like approach to simultaneously find the topology, branch lengths and substitution model parameters that maximize the log-likelihood (lnL). This involves the evolution of a population of solutions termed individuals, with each individual encoding a tree topology, a set of branch lengths and a set of model parameters. Each individual is assigned a fitness based on its lnL score. Each generation random mutations are applied to some of the components of the individuals, and their fitnesses are recalculated. The individuals are then chosen to be the parents of the individuals of the next generation, in proportion to their fitnesses. This process is repeated many times, and the population of individuals evolves toward higher fitness solutions. Note that the highest fitness individual is automatically maintained in the population, ensuring that it is not lost due to chance (genetic drift).
The mutation types used by GARLI are divided into three types: topological mutations, model parameter mutations and branch-length mutations. Topological mutations consist of the standard NNI and SPR rearrangement types, as well as a localized form of SPR in which the pruned subtree may only be reattached to branches within a certain radius of its former location. Topological mutations are followed by a variable amount of branch-length optimization. Model mutations simply choose one of the model parameters and multiply it by a gamma-distributed variable with mean 1.0. When branch-length mutations are performed, a number of branches are chosen and each has its current length multiplied by a different gamma-distributed variable.
The GARLI GUI is a wrapper around the GARLI program that creates the config file for your run, and provides a real-time visual display of your run progress. Currently, the GUI version of GARLI is only available in v0.951 but it should be available shortly in v0.96."
Maximum Likelihood Search in GARLI
1. Open GARLI. You can execute the program by double-clicking on the GARLI icon.
2. In GARLI-GUI, open the primatespaup.nex file by going to the File menu and selecting Open….
3. GARLI should display a series of intuitive menus that will allow you to set up and run your analysis. Because GARLI is designed for the rapid analysis of larger datasets the default model for analysis is GTR+I+G. However, proper model selection should still be conducted and the analysis set appropriately. Let's try changing the default model by going through the menus and set up so that our analysis will run under HKY+I.
3a. In the General menu, give your output files a name.
3b. In the Model menu, set the model to HKY+I (two substitution rates, estimate base frequencies, and estimate a proportion of invariant sites).
3c. Leave the Genetic Algorithm settings unchanged.
4. Go to the Run window and run the analysis by clicking Run. You will immediately see the progress of your run and the increasing lnL score.
5. Rescale your plot, by changing the minimum/maximum values of the axes.
AVOID GETTING TRAPPED IN LOCAL OPTIMA!!!
It is critical that you do multiple runs when conducting a GARLI analysis. By doing this, you can compare the results you obtain from the different runs to determine if any of your tree searches became trapped in local optima. If all of your runs converge on very similar lnL scores, then you can be more confident that you found the optimal tree. A common rule of thumb is to run enough analyses that you recover the best tree at least twice.
6. With each tree from your multiple GARLI runs, you should score the tree under likelihood in PAUP. The reason for this is because GARLI may report identical trees with very similar (but not identical) scores. Identical scores will be recovered when branch lengths and parameters are fully optimized. So now let's open PAUP and score our Garli tree(s) under likelihood. Refer to the PAUP exercise for instruction on how to download, install, and open PAUP*.
7. Execute the data file. NOTE: Be sure that you have commented out the Character/Taxon and PAUP blocks that we used in the previous excercise. So you may need to re-save this file after putting the open and close brackets back in to the file.
> exec primatespaup.nex;
8. Change the optimality criterion to likelihood.
> set crit=likelihood
9. Get the likelihood tree generated by GARLI.
> gettrees file=MLboot.tre mode=3;
10. Score the tree. Be sure to set the appropriate model, in this case HKY+I.
> lscore all/nst=2 basefreq=estimate tratio=estimate pinvar=estimate;
Nonparametric bootstrapping in GARLI
Conducting a bootstrap in GARLI is very easy. Follow the instructions above to set up a run. To conduct a bootstrap, just specify the number of replicates on the general menu. When you conduct a nonparametric bootstrapping run in GARLI, the program generates a file containing the best tree from each bootstrapping replicate. These trees can then be read into PAUP* and a majority-rule consensus tree can be computed as was done in the PAUP tutorial for the parsimony bootstrap. (Obviously it is not necessary to specify the use of treeweights in PAUP when importing the trees because you will only have 1 tree per pseudoreplicate).
Note: You must be logged in to add comments