Phylogeny

This web page was produced as an assignment for Genetics 564, an undergraduate course at UW-Madison

What is phylogeny?

Phylogeny is the relationship between organismal lineages as they evolve through time [2]. Organisms can be considered to be connected based on the passage of genes throughout evolutionary history, and these connections can be used to create trees representing similarity between organisms. Phylogenetic trees can be considered an effort to retrace evolutionary history through mathematical models that determine similarity between characteristics shared between organisms [3].

To generate phylogenetic trees, programs such as Clustal W2, MUSCLE, or T-COFFEE will align amino acid sequences of proteins and identify regions of similarity and divergence. After a multiple sequence alignment, programs will use one of two techniques for measuring the closeness between sequences in the alignment. Percent identity calculates the percentage identity of sequences at each aligned position. BLOSUM creates a score matrix that assigns each amino acid in a sequence a score according to its frequency. Additionally, programs can use methods to refine tree structure. For instance, the neighbor joining method of programs is employed to create trees with the shortest branch lengths. In contrast, the average distance methods constructs trees that group similar sequences in to each node, and branch length reflects the shared sequence between branches within the node.

Neighbor Joining tree using percent identity

Tree obtained using Clustal W2. Tree verified using MUSCLE.

Average distance using BLOSUM62

Tree obtained using Clustal W2 and verified using MUSCLE

Average Distance Tree using percent identity

Tree obtained using Clustal W2 and verified using MUSCLE

Neighbor joining tree using BLOSUM62

Tree identified using Clustal W2 and verified using MUSCLE

Analysis

These phylogeny trees yielded unexpected results. Organisms that are often considered to be evolutionarily closely related (such as humans and chimpanzees) are not always grouped in to the same node. Additionally, the trees also often yielded phylogenetic trees with great discrepancies between each other. This could be because of the differences in domain structure in polycystin-1 between organisms. Polycystin-1 is a large protein with many exons and domains -- closely related organisms might simply have a different overall domain composition but similar functional domain composition. Alternately, these differences could be due to splicing differences between organisms. More specific analysis of these differences in domain structure or splicing might yield insight in to the functional significance of each part of the particularly large polycystin-1 protein.

References
[1] Banner image: http://www.nature.com/nature/journal/v465/n7300/fig_tab/nature09113_F3.html
[2] Tree of life: what is phylogeny?
[3] Delsuc, F., Brinkmann, H., Herve Philippe. (2005). Phylogenomics and the reconstruction of the tree of life. Nature Reviews Genetics. 6, 361-375. doi:10.1038/nrg1603

Site created by: Elizabeth Roeske
Last updated: 5.14.2014
University of Wisconsin-Madison: Genetics 564