Maximum Likelihood (GARLI 0.96)

EditEdit InfoInfo TalkTalk
Search:    
Primary Contact(s) Created Required Software Example Datafile
Greg Pauly 8 March 2009 [WWW]GARLi v0.96,[WWW]TextWrangler (for Macs) or [WWW]TextPad (for PCs) primatespaup.nex

Summary

Maximum likelihood is one of the most widely used optimization methods for phylogenetic tree reconstruction. Unfortunately, full maximum likelihood analyses of many datasets is extremely computationally expensive. As a result, several software packages have been developed to conduct fast maximum likelihood analyses. Two common programs used for this are GARLI and RAxML. For this tutorial, we will be reconstructing trees using GARLI v0.96. GARLI operates on Macintosh, Windows, and Lennox. You run the analysis by modifying a config file that you then executed from the command prompt.

  1. Summary
  2. Introduction
  3. Tutorial
    1. Likelihood search in GARLI Version 0.96
    2. Nonparametric bootstrapping in GARLI

Introduction

What is GARLI?
Here's a short algorithm description from page 2 of the GARLI Manual (Zwickl, 2008).

GARLI is loosely based on the program GAML, by Paul O. Lewis (1998). It uses a stochastic genetic algorithm-like approach to simultaneously find the topology, branch lengths and substitution model parameters that maximize the log-likelihood (lnL). This involves the evolution of a population of solutions termed individuals, with each individual encoding a tree topology, a set of branch lengths and a set of model parameters. Each individual is assigned a fitness based on its lnL score. Each generation random mutations are applied to some of the components of the individuals, and their fitnesses are recalculated. The individuals are then chosen to be the parents of the individuals of the next generation, in proportion to their fitnesses. This process is repeated many times, and the population of individuals evolves toward higher fitness solutions. Note that the highest fitness individual is automatically maintained in the population, ensuring that it is not lost due to chance (genetic drift).

The mutation types used by GARLI are divided into three types: topological mutations, model parameter mutations and branch-length mutations. Topological mutations consist of the standard NNI and SPR rearrangement types, as well as a localized form of SPR in which the pruned subtree may only be reattached to branches within a certain radius of its former location. Topological mutations are followed by a variable amount of branch-length optimization. Model mutations simply choose one of the model parameters and multiply it by a gamma-distributed variable with mean 1.0. When branch-length mutations are performed, a number of branches are chosen and each has its current length multiplied by a different gamma-distributed variable.

Why use version 0.96 instead of 0.951?
One earlier version of GARLI was 0.951. [WWW]A tutorial on this earlier version is available on this site. The benefit of 0.951 is that it is extremely easy to use on Macs because of the GUI. However, version 0.96 has a number of improvements. In particular, 0.96 1) has amino acid and codon based models of sequence evolution, 2) allows setting multiple search replicates from a single config file, 3) allows backbone and normal topological constraints, 4) allows exclusion sets in the nexus assumptions block, and 5) is faster for non-parametric bootstrapping. There are a bunch of other improvements as well; for a full list check out the GARLI Manual.

Instead of using the GUI, as in 0.951, the analysis in 0.96 is set by modifying a config file in a text editor and then executing the file from the command prompt. A GUI for 0.96 should be available in the future (check the GARLI website).

Just like with PAUP, the GARLI Manual is extremely helpful and you should expect to reference it frequently. The following paragraph is taken nearly verbatim from the GARLI Manual.

All directions to the program are provided through a text-based configuration file, which by default is named garli.conf. You will need to open this file in a text editor (Textedit, BBedit, TextWrangler, Notepad, etc) to make changes. In the file you will see a series of fairly cryptically named options. The only thing that must be changed in the file is to enter your dataset name as the datafname. You should also change the ofprefix setting to give the program a prefix to be added to the beginning of output filenames. The default values for most settings should work well, at least for an initial exploratory run. The GARLI manual gives details on what the settings mean and why/how you might want to alter them. Make sure that you save the configuration file after making any changes.

In addition to the GARLI manual, there is also a very useful GARLI wiki. [WWW]The Advanced Topics page has several examples with instructions for GARLI and for examining the output trees in PAUP. Check it out.

Tutorial

Likelihood search in GARLI Version 0.96

1.As mentioned above, GARLI runs from a config file. We need to modify this config file in a text editor to set our analysis.
Open garli.conf file in a text editor.
Enter "primatespaup.nex" as the datafname (data file name).
Enter "primateGarli" as your ofprefix (output file name).
WARNING: Be sure to change the ofprefix setting in the config file between separate runs of the same nexus file so that you don’t overwrite your previous results.

If you want to run more than the default (2) number of search replicates, change the value for searchreps.

GARLI is designed for the rapid analysis of larger datasets. As a result, the default model for analysis is GTR+I+G. However, proper model selection should still be conducted and the analysis set appropriately. Just to gain familiarity with changing the settings, let's change the settings to have an HKY+I model.

Enter "2rate" for ratematrix.
We need to prevent estimation of rate heterogeneity (remember we are doing HKY+I), so:
Enter "none" as the ratehetmodel (model of rate heterogeneity).
Enter "1" for Numratecats (the number of categories of variable rates, which must be set to 1 if ratehetmodel is 0)

Save the conf.file.

2. Open GARLI once you are in the directory containing your executable and your dataset. GARLI automatically looks for the config file and then runs from your specified input/nexus file
./Garli0.96

On my laptop, running the 2 GARLI replicates took about 4 minutes.

AVOID GETTING TRAPPED IN LOCAL OPTIMA!!!
It is critical that you do multiple runs when conducting a GARLI analysis. By doing this, you can compare the results you obtain from the different runs to determine if any of your tree searches became trapped in local optima. If all of your runs converge on very similar lnL scores, then you can be more confident that you found the optimal tree. A common rule of thumb is to run enough analyses that you recover the best tree at least twice (and recovering the best tree more than twice is even better; if you have the time, do more runs to be confident that you are not getting trapped in local optima).

3. With each tree from your multiple GARLI runs, you should score the tree under likelihood in PAUP*. The reason for this is because GARLI may report identical trees with very similar (but not identical) scores. If they are identical trees, then identical scores will be recovered when branch lengths and parameters are fully optimized. So now let's open PAUP and score our GARLI tree(s) under likelihood.

Start PAUP*.
./paup

4. Execute the NEXUS data file.

5. Change the optimality criterion to likelihood.

6. Get the trees from the GARLI run. This can be done from one of the .best.tre files or getting all trees simultaneously if multiple runs were done sequentially from the same config file.

7. Score the tree. Be sure to set the appropriate model.

Nonparametric bootstrapping in GARLI

Conducting a bootstrap in GARLI is very easy. Follow the instructions above for setting up a run. Additionally, in the config file, set the bootstrapreps to the number of replicates you would like.

The best trees from each bootstrap pseudoreplicate dataset will be in a single file called <ofprefix>.boot.tre. These trees can then be read into PAUP* and a majority-rule consensus tree can be computed as is exemplified in the [WWW]PAUP* tutorial. (Obviously it is not necessary to specify the use of treeweights in PAUP when importing the trees because you will only have 1 tree per pseudoreplicate.)

Comments:

dont enter into this box:

This is a Wiki Spot wiki. Wiki Spot is a non-profit organization that helps communities collaborate via wikis.