We are searching data for your request:

**Forums and discussions:**

**Manuals and reference books:**

**Data from registers:**

**Wait the end of the search in all databases.**

Upon completion, a link will appear to access the found materials.

Upon completion, a link will appear to access the found materials.

Hey guys can you help me with this one!?

What happens between two nodes (in an evolutionary tree) or a between a node and a tip?

A) at least one character changes

B) all organisms die

C) Nothing can happen between two nodes

D) Monophyly, Paraphyly, Polyphyly

E) None of the above

I came down to E) and C). Since B) doesnt even make sense, organisms don't die among two nodes- they evolve. D) it is irrelevant to the question asked A) I think that probably is wrong as well, since among a node and a tip, the organism stays the same and finally I thought C) could be the answer, but at the same time a new trait could be formed,

The question does not make much sense and I think you've gone about as far as anyone can go.

A) at least one character changes

It depends what it means by character and it depends the detail semantic of the tree. In general, an evolutionary tree, usually called a phylogenetic tree attempts at representing time by the length of the edges in between two nodes. The goal is, in general, not to represent anything else.

In the comments, @John has argued that time can be consider as a character. Personally, when I read the term "character" I think of a phenotypic character. While it will pretty much always be true that among two nodes some difference will exist in the phenotypic character, it is not a fundamental consequence of the concept of evolutionary tree. This is why I argued that A is false. @John takes the opposite stance in his answer. Note also, that sometimes (although quite rarely today) phenotypic states are used to build a tree and hence, in such case, it would be a direct consequence of the methodology that two nodes must necessarily differ by the value of at least one phenotypic character.

This disagreement between @John and I highlights that the question is poorly phrased and actually makes little sense.

B) all organisms die

Well, most likely enough time separate the two nodes that all organisms that were represented in the ancestral node are dead but of course they have reproduced, so that we actually have another node that descend from it.

C) Nothing can happen between two nodes

As you said, they evolve. So, things happen!

D) Monophyly, Paraphyly, Polyphyly

As you said, those terms make no sense here!

E) None of the above

If there is any intuitive answer to the question`what has happened between two nodes`

is`time has passed`

. I think the answer your teacher was expecting here is`E`

but it is also possible (s)he was expecting A. Really that's a pretty poorly written question.

Your answer is **A**. for one simple reason, two different nodes represent at least one notable/measurable difference between groups, so between any two nodes there must be at least one difference.

Look at this example, whether A and B represents Mammals and Birds or two subspecies of chimp, or two strains of virus the fact they are two separate nodes means there is some difference between them. When constructing a tree there has to be at least one character difference to have more than one node. A tree with no differences is just a dot by itself.

Also an organisms does not stay the same between a node an the tip (the tip is a node) to use the below example the starred node between A and B is the most recent common ancestor of A and B which is not the same thing as A or B.

## Visualizing and Annotating Phylogenetic Trees with R+ggtree

This lesson demonstrates how to use **ggtree**, an extension of the ggplot2 package to visualize and annotate phylogenetic trees. Many of the examples here were modified from the ggtree vignettes.

This lesson does *not* cover methods and software for *generating* phylogenetic trees, nor does it it cover *interpreting* phylogenies. **Here’s a quick primer on how to read a phylogeny** that you should definitely review prior to this lesson, but it is by no means extensive. Genome-wide sequencing allows for examination of the entire genome, and from this, many methods and software tools exist for comparative genomics using SNP- and gene-based phylogenetic analysis, either from unassembled sequencing reads, draft assemblies/contigs, or complete genome sequences. These methods are beyond the scope of this lesson.

Neighbor joining takes as input a distance matrix specifying the distance between each pair of taxa. The algorithm starts with a completely unresolved tree, whose topology corresponds to that of a star network, and iterates over the following steps until the tree is completely resolved and all branch lengths are known:

- Based on the current distance matrix calculate the matrix Q
(defined below). - Find the pair of distinct taxa i and j (i.e. with i ≠ j
) for which Q ( i , j ) has its lowest value. These taxa are joined to a newly created node, which is connected to the central node. In the figure at right, f and g are joined to the new node u. - Calculate the distance from each of the taxa in the pair to this new node.
- Calculate the distance from each of the taxa outside of this pair to the new node.
- Start the algorithm again, replacing the pair of joined neighbors with the new node and using the distances calculated in the previous step.

### The Q-matrix Edit

### Distance from the pair members to the new node Edit

For each of the taxa in the pair being joined, use the following formula to calculate the distance to the new node:

δ ( g , u ) = d ( f , g ) − δ ( f , u )

### Distance of the other taxa from the new node Edit

For each taxon not considered in the previous step, we calculate the distance to the new node as follows:

### Complexity Edit

a | b | c | d | e | |
---|---|---|---|---|---|

a | 0 | 5 | 9 | 9 | 8 |

b | 5 | 0 | 10 | 10 | 9 |

c | 9 | 10 | 0 | 8 | 7 |

d | 9 | 10 | 8 | 0 | 3 |

e | 8 | 9 | 7 | 3 | 0 |

### First step Edit

#### First joining Edit

a | b | c | d | e |
---|---|---|---|---|

a | −50 | −38 | −34 | −34 |

b | −50 | −38 | −34 | −34 |

c | −38 | −38 | −40 | −40 |

d | −34 | −34 | −40 | −48 |

e | −34 | −34 | −40 | −48 |

#### First branch length estimation Edit

#### First distance matrix update Edit

The resulting distance matrix D 1

u | c | d | e | |
---|---|---|---|---|

u | 0 | 7 | 7 | 6 |

c | 7 | 0 | 8 | 7 |

d | 7 | 8 | 0 | 3 |

e | 6 | 7 | 3 | 0 |

### Second step Edit

#### Second joining Edit

u | c | d | e |
---|---|---|---|

u | −28 | −24 | −24 |

c | −28 | −24 | −24 |

d | −24 | −24 | −28 |

e | −24 | −24 | −28 |

#### Second branch length estimation Edit

The joining of the elements and the branch length calculation help drawing the neighbor joining tree as shown in the figure.

#### Second distance matrix update Edit

d ( v , d ) = 1 2 [ d ( u , d ) + d ( c , d ) − d ( u , c ) ] = 7 + 8 − 7 2 = 4

v | d | e | |
---|---|---|---|

v | 0 | 4 | 3 |

d | 4 | 0 | 3 |

e | 3 | 3 | 0 |

### Final step Edit

The tree topology is fully resolved at this point. However, for clarity, we can calculate the Q 3

Q 3 ( v , e ) = ( 3 − 2 ) d ( v , e ) − ∑ k = 1 3 d ( v , k ) − ∑ k = 1 3 d ( e , k ) = 3 − 7 − 6 = − 10

v | d | e |
---|---|---|

v | −10 | −10 |

d | −10 | −10 |

e | −10 | −10 |

The neighbor joining tree is now complete, as shown in the figure.

### Conclusion: additive distances Edit

Neighbor joining may be viewed as a greedy heuristic for the Balanced Minimum Evolution [5] (BME) criterion. For each topology, BME defines the tree length (sum of branch lengths) to be a particular weighted sum of the distances in the distance matrix, with the weights depending on the topology. The BME optimal topology is the one which minimizes this tree length. Neighbor joining at each step greedily joins that pair of taxa which will give the greatest decrease in the estimated tree length. This procedure does not guarantee to find the optimum for the BME criterion, although it often does and is usually quite close.

The main virtue of NJ is that it is fast [6] : 466 as compared to least squares, maximum parsimony and maximum likelihood methods. [6] This makes it practical for analyzing large data sets (hundreds or thousands of taxa) and for bootstrapping, for which purposes other means of analysis (e.g. maximum parsimony, maximum likelihood) may be computationally prohibitive.

Neighbor joining has the property that if the input distance matrix is correct, then the output tree will be correct. Furthermore, the correctness of the output tree topology is guaranteed as long as the distance matrix is 'nearly additive', specifically if each entry in the distance matrix differs from the true distance by less than half of the shortest branch length in the tree. [7] In practice the distance matrix rarely satisfies this condition, but neighbor joining often constructs the correct tree topology anyway. [8] The correctness of neighbor joining for nearly additive distance matrices implies that it is statistically consistent under many models of evolution given data of sufficient length, neighbor joining will reconstruct the true tree with high probability. Compared with UPGMA and WPGMA, neighbor joining has the advantage that it does not assume all lineages evolve at the same rate (molecular clock hypothesis).

Nevertheless, neighbor joining has been largely superseded by phylogenetic methods that do not rely on distance measures and offer superior accuracy under most conditions. [* citation needed *] Neighbor joining has the undesirable feature that it often assigns negative lengths to some of the branches.

There are many programs available implementing neighbor joining. RapidNJ and NINJA are fast implementations with typical run times proportional to approximately the square of the number of taxa. BIONJ and Weighbor are variants of neighbor joining which improve on its accuracy by making use of the fact that the shorter distances in the distance matrix are generally better known than the longer distances. FastME is an implementation of the closely related balanced minimum evolution method.

## Measuring the Distance Between Networks

The *network comparison* problem derives from the *graph isomorphism* problem. Two (undirected, unweighted) graphs *G*_{1}(*V*_{1}, *E*_{1}) and *G*_{2}(*V*_{2}, *E*_{2}) are isomorphic if there exists a one-to-one correspondence Φ mapping the node set *V*_{1} onto *V*_{2} such that the edge (*u*, *v*) ∈ *E*_{1} if and only if the edge (Φ(*u*), Φ(*v*)) ∈ *E*_{2} 9 . The complexity of the *graph isomorphism problem*, i.e., checking whether two finite graphs are isomorphic, is unknown in rigorous terms 10,11,12 : nonetheless, efficient algorithms exist for many classes of graphs 13 . In any case, isomorphism is an *exact graph matching*: if used as a distance for comparison, it gives a binary outcome: the graphs are either isomorphic, i.e., identical, or not. This information is poor, however, because networks are almost never identical in applications, and one is interested in assessing to what extent they are similar. To effectively compare networks, we need to move to *inexact graph matching*, i.e., define a real-valued distance which, as a minimal requirement, has the property of converging to zero as the networks approach isomorphism.

Searching for accurate and effective tools to compare networks has pushed the research in many different directions, leading to a wide variety of methods and algorithms. We present a short review of several of the most used approaches to network comparison, acknowledging that the literature is very abundant and we cannot cover it exhaustively neither want to repeat already established results. Following previous approaches 6 , we partition the comparison methods based on whether the induced distances are dependent from the correspondence of nodes. In the former case–*Known Node-Correspondence* (*KNC*)–the two networks have the same node set (or at least a common subset), and the pairwise correspondence between nodes is known. Thus, typically, only graphs of the same size and coming from the same application domain can be compared. In the latter case–*Unknown Node-Correspondence* (*UNC*)–ideally any pair of graphs (even with different sizes, densities, or coming from different application fields) can be compared: typically these methods summarize the global structure into one or more statistics, which are then elaborated to define a distance. This latter notion of distance thus reflects the difference in the global structure of the networks.

For example, imagine that one wants to compare the European air transportation networks of the airlines A and B. The node sets (i.e., the European airports) are the same, thus a KNC method can be applied to quantify to what extent the two sets of edges are similar, i.e., to what extent the two airlines offer the same set of flights. If the exercise is (pairwise) extended to all airlines, the overall results will allow one to cluster airlines supplying similar sets of connections. But the same dataset could alternatively be analysed, with a different aim, by a UNC method, to highlight pairs of airlines whose network is globally structurally similar. Then, for example, one might discover that airlines A and B have both a markedly star-like flight network, but the first is based in Amsterdam (the centre of the star) and the second in Berlin. Here, extending the analysis to all airlines might cluster sets of airlines with similar strategies or business models.

A screening of the literature reveals that the latter problem is far more studied–and that biology is the most popular application field–so that the number of available UNC methods is much larger than that of the KNC ones. Below we present several methods used for network comparison, briefly explaining the details of their approach. In doing that, we will mostly privilege methods of general use, that is, applicable to directed and weighted networks too–this significantly narrows the set of candidates. In the next section, we will numerically compare the performance of a subset of these methods.

### Known node-correspondence (KNC) methods

#### Difference of the adjacency matrices

The simplest and näive measures are obtained by directly computing the difference of the adjacency matrices of the two networks. Then any matrix norm can be used, e.g., Euclidean, Manhattan, Canberra, or Jaccard 14 . All of them are suitable to compare all kinds of graphs (directed or not, weighted or not), with the exception of the Jaccard distance which needs to be extended to the Weighted Jaccard distance 15 (the definitions of the four norms are recalled in the Supplementary Information file, Sec. S1). Although this straightforward approach is rarely used in network comparison, we include it in the pool and consider it as a baseline approach.

#### DeltaCon

It is based on the comparison of the similarities between all node pairs in the two graphs 16,17 . The similarity matrix of a graph is defined by *S* = [*s*_{ij}] = [*I* + *ε* 2 *D* − *εA*] −1 , where *A* is the adjacency matrix, *D* = diag(*k*_{i}) is the degree matrix, *k*_{i} is the degree of node *i*, and *ε* > 0 is a small constant. The rationale of the method is that just measuring the overlap of the two edge sets might not work in practice, because not all edges have the same importance. Instead, the difference between *r*-step paths, *r* = 2, 3, …, provides a much more sensitive measure. As a matter of fact, it can be shown 16 that *s*_{ij} depends on all the *r*-paths connecting (*i*, *j*). The DeltaCon distance between the *N* × *N* similarity matrices *S* 1 = [*s*_{ij} 1 ] and *S* 2 = [*s*_{ij} 2 ] is finally defined using the Matusita distance:

Equation (1) assures that DeltaCon satisfies the usual axioms of distances. Moreover, it can be shown 16,17 that it also satisfies a few desirable properties regarding the impact of specific changes. Such properties are: changes leading to disconnected graphs are more penalized in weighted graphs, the bigger the weight of the removed edge is, the greater the impact on the distance a change has more impact in low density graphs than in denser graphs with equal size random changes produce smaller impacts than targeted ones.

The computational complexity of the DeltaCon algorithm is quadratic in the number of nodes. To improve execution speed, an approximated version was proposed, which restricts the computation of the similarity matrices to groups of randomly chosen nodes 16 : this version has linear complexity in the number of edges and groups. Finally, DeltaCon was extended 17 to make it able to find which nodes or edges are most responsible for the differences between the two graphs.

#### Cut distance

This method 12 is based on the notion of cut weight, which is standard in graph theory and also used in community detection 4 . Given a (possibly directed, weighted) graph *G* = (*V*, *E*) with edge weights *w*_{ij}, *i*, *j* ∈ *V*, and two disjoint node sets *S*, *T* ⊂ *V*, the cut weight is defined as (*S* to *T*. The cut distance between two graphs *G*_{1}(*V*, *E*_{1}) and *G*_{2}(*V*, *E*_{2}) with the same node set is then defined as

where *S* *C* = *V**S*. Thus two networks are similar if they have similar cut weight for all possible network bipartitions. The maximization is performed through genetic algorithms, which makes the comparison of large networks (thousands of nodes and larger) unfeasible. On the other hand, this is one of the few methods able to compare directed, weighted graphs.

### Unknown node-correspondence (UNC) methods

#### Global statistics

Simple metrics can be obtained by comparing the value of network statistics, such as the clustering coefficient 18,19,20,21 , the diameter 19,21 , or the average distance 18 . Although being intuitive and typically computationally efficient, often this approach does not yield robust results. As a matter of fact, similar values of network statistics do not necessarily imply similar network structures (e.g., see the discussion in ref. 22 ) and indeed, the comparison often fails in catching important local features. On the other hand, these simple metrics offer a computationally low-cost alternative that can be useful for a first analysis.

#### Mesoscopic response functions (MRFs)

This method exploits the information carried by the mesoscopic properties of the networks, i.e., their modular structure 23 . Three functions–called MRFs–are defined, the Hamiltonian *H*(*λ*), the partition entropy *S*(*λ*), and the number of communities *η*(*λ*), which describe the properties of a given network at different mesoscopic scales: the parameter *λ* tunes the fragmentation of the network into communities. The network distance is defined for a given set of graphs: for each network pair, the distances between corresponding MRFs are defined by standard function metrics, then the first principal component obtained from PCA is taken as distance. This entails non-comparability among different datasets, since the distance between two networks depends on the dataset they are part of. On the other hand, it is the only available method based on mesoscale properties, and allows one to consider both weighted and unweighted undirected networks. The computational efficiency of the method mostly depends on the efficiency of the community detection algorithm used.

#### Graphlet-based methods

Graphlets are small, connected, non-isomorphic subgraphs of large networks. Originally proposed for undirected (unweighted) networks 18 , their use has been subsequently extended to directed networks 24,25 . They encode important information about the structure of the network and provide a valuable tool for comparison. The different types of graphlets need to be enumerated, and this can be done in two ways, i.e., by taking into account or not their automorphism orbits 22 , which differentiate the roles of the nodes in each graphlet (see Fig. 1, where graphlets are enumerated from *G*_{0} to *G*_{29} and orbits from *O*_{0} to *O*_{72}). Usually graphlets with more than five nodes are not considered, both for computational reasons and due to repetition of smaller graphlets within their structure.

Graphlets (2- to 5-node) in undirected, unweighted networks (from ref. 22 ). The 30 graphlets defined by Pržulj *et al*. 18 are labeled *G*_{0} to *G*_{29}. In each graphlet, nodes with the same shading belong to the same automorphism orbit *O*_{0} to *O*_{72}, i.e., they have the same characteristics and are indistinguishable to each other 22 .

Counting all graphlets of a network is, in principle, a very demanding task: given a graph with *N* nodes and *L* edges, the worst-case running time for counting 2- to *k*-node graphlets (for both the undirected and directed case) with a complete enumeration strategy is *O*(*N* *k* ): a tighter upper bound gives *O*(*Nk*_{max} *k*−1 ), where *k*_{max} is the maximum node degree of the graph 20,24 . In practice, these pessimistic bounds are never reached: thanks to the sparsity of real-world networks, and exploiting wiser counting strategies, large improvements are possible. Hočevar and Demšar 26 proposed the ORCA algorithm, based on a particular counting strategy. Its complexity is *O*(*Lk*_{max} + *T*_{4}) for the enumeration of 2- to 4-node graphlets, and *O*(*Lk*_{max} 2 + *T*_{5}) for 2- to 5-node graphlets, where *T*_{4} and *T*_{5} are terms which are negligible in most situations. Aparicio, Ribeiro and Silva 25 proposed another approach based on a particular data structure, the *G-Trie* 27 : it demonstrated higher performances with respect to ORCA, but its theoretical upper bound is not provided.

Graphlet-based network distances are based on graphlet counts, which can be organized in several ways:

*Relative Graphlets Frequency Distance* (*RGFD*) 18 . The 29 graphlets from 3 to 5 nodes are counted in each network. Then the distance is defined as (d(*^<29>,| _*

*(*_<1>)-_

*(*_<2>)|) , where

*F*

_{i}( ⋅ ) denotes the count of graphlet

*i*normalized with respect to the total number of graphlets in the graph.

* *

*Graphlet Degree Distribution Agreement* (*GDDA*) 22 . The 73 automorphism orbits of graphlets from 2 to 5 nodes are counted in each network *G*. For each orbit *j* = 0, 1, …, 72, the graphlet degree distribution (GDD) *d*_{G} *j* (*k*), which is the number of nodes in *G* touching *k* times that orbit, is computed. This quantity is first scaled as *d*_{G} *j* (*k*)/*k*, and then normalized by the total area *T*_{G} *j* under the *j*-th GDD, obtaining *N*_{G} *j* (*k*) = (*d*_{G} *j* (*k*)/*k*)/*T*_{G} *j* . Then, the agreement of the *j*-th GDD between networks *G*_{1} and *G*_{2} is defined as

and the final GDDA distance is taken as the geometric or arithmetic mean of all the 73 agreements *A* *j* (*G*_{1}, *G*_{2}).

*Graphlets Correlation Distance* (*GCD*): Yaveroglu *et al*. 19 investigated the dependencies among graphlets and found that some orbits are redundant, i.e., their count can actually be obtained from the counts of other orbits. Discarding the redundant orbits led to the definition of a more sensible and efficient measure. For instance, graphlets up to 4 nodes have 4 redundant orbits, and discarding them reduces the number of orbits to 11 from the original 15. For a given *N*-node network *G*, the *N graphlets degree vectors* 28 , i.e., the count of the considered orbits that each node touches, are appended row by row to form a *N* × 11 matrix. Then, the Spearman’s correlation coefficient is computed between all column pairs, obtaining the 11 × 11 *Graphlet Correlation Matrix GCM*_{G}. The GCD distance between graphs *G*_{1} and *G*_{2} is finally defined as the Euclidean distance between the upper triangular parts of the matrices *GCM*_{G1} and *GCM*_{G2}. Yaveroglu *et al*. showed that GCD-11 (the distance with non-redundant orbits) outperformed GCD-15 (with redundant orbits), but also GCD-73 and GCD-56, the distances based on 5-node graphlets with or without reduntant orbits, respectively. Also, GCD-11 performed better than other graphlet-based distances in recognizing different network models.

*NetDis* 29 : It compares graphlet counts in overlapping neighbourhoods of nodes, rather than in the entire network: more specifically, it considers the 2-step ego-network of each node. The rationale comes from the observation that network size and density strongly influence global graphlet counts, and that such effect can be attenuated by restricting to local subnetworks. Graphlet counts (from *G*_{0} to *G*_{29}) are computed for each ego-network and then normalized with respect to the expected counts from a null model. Denote by *S*_{w}(*G*) the sum over all ego-networks of the normalized count of graphlet *w* in graph *G*. Then, for a given size *k* ∈ <3, 4, 5>of the graphlets:

where *M*(*k*) is a normalizing constant forcing *netD*_{2} *S* (*k*) ∈ [−1, 1]. Finally, the NetDis distance for *k*-node graphlets is defined as

Note that the NetDis measure actually depends on *k*, which is therefore a parameter to be selected. Yaveroglu *et al*. 20 pointed out a few critical aspects of the method, such as the choice of a null model, the computational efficiency, and the performances, which are overall inferior to those of other graphlet-based distances.

*GRAFENE* 21 . In this method, the graphlet degree vectors for graphlets *G*_{0} to *G*_{29} are first computed for each node, and then scaled in [0, 1] dividing each component by the total counts of the corresponding graphlet in the whole network. Principal Component Analysis is then performed over the rescaled graphlet degree vectors, and the first *r* components that account for at least 90% of the total variability are kept. The distance between the two networks is defined as 1 − *d* *cos* (*R*_{1}, *R*_{2}), where *d* *cos* is the cosine similarity and *R*_{1}, *R*_{2} are the first *r* principal components for the two graphs. The use of PCA is a novel idea within the graphlet-based methods, that improves the quality of results and the computational performances. Tests on synthetic networks showed that GRAFENE performs at least as well as the other alignment-free methods, and outperforms all other methods on real networks 21 .

Multiple approaches 24,25 have extended the applicability of graphlet-based methods introducing *directed graphlets*, with the aim of comparing directed (yet unweighted) networks. This allowed to define directed versions of a few of the existing distances 24 : the *Directed Relative Frequency Distance* (*DRGFD*), the *Directed Graphlet Degree Distribution Agreement* (*DGDDA*) and the *Directed Graphlets Correlation Distance* (*DGCD*).

#### Alignment-based methods

These methods create a mapping between the nodes of two graphs (*alignment* procedure) trying to maximize an objective function that captures the quality of matching. They originated in computational biology, where a number of algorithms have been proposed (e.g., refs. 30,31,32 ): most of them, however, rely on the biological information associated to the network and, as such, have not general applicability. The GRAAL (Graph Aligner) family (GRAAL 33 , H- 34 , MI- 11 , C- 35 , and L-GRAAL 36 ), on the contrary, performs network alignment based on the network topology (except the most recent L-GRAAL, which can exploit both biology and topology). Given the graphs *G*_{1} = (*V*_{1}, *E*_{1}) and *G*_{2} = (*V*_{2}, *E*_{2}) with |*V*_{1}| ≤ |*V*_{2}|, a node-mapping *f*:*V*_{1} → *V*_{2} is defined, which induces an edge-mapping *g*:*V*_{1} × *V*_{1} → *V*_{2} × *V*_{2} such that *g*(*E*_{1}) = <(*f*(*u*), *f*(*v*)): (*u*, *v*) ∈ *E*_{1}>. The objective function that measures the quality of the node alignment *f* is then expressed by the *edge correctness*

i.e., the fraction of edges in *E*_{1} aligned to edges in *E*_{2}. Solving the alignment problem boils out to finding *f* that maximizes *EC*. The GRAAL family proposes a number of different approaches: GRAAL and H-GRAAL assign a similarity score to each pair (*u*_{1}, *u*_{2}) ∈ *V*_{1} × *V*_{2} based on the graphlet degrees (see above) of the two nodes, thus accounting for the similarity of their neighbourhoods. Then for the alignment GRAAL uses a greedy “seed-and-extend” algorithm trying to maximize the aggregate similarity, while H-GRAAL solves the same problem as an optimal assignment (Hungarian algorithm). In MI-GRAAL a confidence score, computed by taking into account various node statistics (e.g., degree, clustering coefficient, etc.) is assigned to each pair (*u*_{1}, *u*_{2}) ∈ *V*_{1} × *V*_{2}, then nodes are aligned starting from the pairs with highest score. The method is customizable, since any node statistics can be used to compute the confidence score–this allows one the comparison of directed and weighted networks. C-GRAAL does not require an explicit node similarity measure (which however can be incorporated, if available) but works on networks topology with an iterative common-neighbors-based algorithm. Finally, L-GRAAL optimizes a novel objective function that takes into account both sequence-based protein conservation and graphlet-based interaction conservation, by using an alignment heuristic based on integer programming and Lagrangian relaxation. The main drawback of these methods is their computational efficiency, which scales at least quadratically with the number of nodes.

#### Spectral methods

Here the rationale is that, since the spectrum of the representation matrix of a network (adjacency or Laplacian matrix) carries information about its structure, comparing spectra provides metrics for comparing networks. Different approaches are used: Wilson and Zhu 37 proposed to simply take the Euclidean distance between the two spectra, while Gera *et al*. 38 proposed to take as distance the *p*-value of a nonparametric test assessing whether the two spectra come from the same distribution. Despite the ease of use, spectral methods prove to suffer from many drawbacks, including cospectrality between different graphs, dependence on the matrix representation, and abnormal sensitivity (small changes in the graph’s structure can produce large changes in the spectrum).

#### NetLSD (Network laplacian spectral descriptor)

This method 39 summarizes the features of the (undirected, unweighted) graph *G* by a vector derived from the solution of the “heat equation” ∂*u*_{t}/∂*t* = −*Lu*_{t}, where *u*_{t} is an *N*-dimensional vector and *L* = *I* − *D* −1/2 *AD* −1/2 is the normalized Laplacian matrix. This resembles the dynamics of a continuous-time random walker (e.g., ref. 40 ) so that the solution is a pagerank-like centrality 41 . *L* is symmetric and can be written as *L* = ΦΛΦ T via spectral decomposition, hence the closed-form solution is given by the *N* × *N* “heat kernel” matrix

whose entry (*H*_{t})_{ij} is the heat transferred from node *i* to *j* at time *t*. NetLSD condenses the graph representation in the *heat trace signature*

The continuous-time function *h*_{t} is finally transformed into a finite-dimensional vector by sampling over a suitable time interval, and the distance between two networks *G*_{1}, *G*_{2} is taken as the norm of the vector difference between *h*(*G*_{1}) and *h*(*G*_{2}). The competitive performance of NetLSD is demonstrated in a few machine learning classification tasks. The time complexity is *O*(*N* 3 ), if the full eigendecomposition of the Laplacian is carried out. This would limit the scope of applicability to a few thousands nodes: for that, approximation schemes are devised for reconstructing the spectrum after computing only a limited number of eigenvalues. In this way, networks up to 10 6 nodes can be treated. Although not discussed in the paper, the method is readily applicable to weighted graphs, whereas the extension to directed networks appears non trivial due to the different spectral characteristics.

#### Portrait divergence

It is a recent method 42 based on a graph invariant which encodes the distribution of the shortest-path lengths in a graph: the *network portrait* 43 is a matrix *B* whose entry *B*_{lk}, *l* = 0, 1, …, *d* (*d* is the graph diameter), *k* = 0, 1, …, *N* − 1, is the number of nodes having *k* nodes at shortest-path distance *l*. The definition also extends to directed and weighted networks–in the weighted case, a binning strategy is needed to manage real-valued path lengths. The network portrait is a powerful summary of the topological features of the graph–e.g., the number of nodes and edges, the degree distribution, the distribution of the next-nearest neighbours, and the number of shortest paths of length *l* can straightforwardly be recovered from *B*. The Portrait Divergence distance between graphs *G*_{1} and *G*_{2} is then defined as follows. First, the probability *P*(*k*, *l*) (and similarly *Q*(*k*, *l*) for the second graph) of randomly choosing two nodes at distance *l* and, for one of the two nodes, to have *k* nodes at distance *l*, is computed:

where *n*_{c} is the number of nodes in the connected component *c*. Then, the Portrait Divergence distance is defined using the Jensen-Shannon divergence:

where *M* = (*P* + *Q*)/2 is the mixture distribution of *P* and *Q*, and *KL*( ⋅ || ⋅ ) is the Kullback-Liebler divergence. The method is computationally efficient for small and medium size graphs, since it is quadratic in the number of nodes, and can naturally handle disconnected networks.

#### Graph kernels

A graph kernel *k*(*G*_{1}, *G*_{2}) is a non-negative function of two feature vectors *f*_{1}, *f*_{2}, respectively representative of the two graphs *G*_{1}, *G*_{2}. In abstract form, a graph kernel implements a (generalized) inner product of the two graphs, which is taken as a measure of their similarity. The proposal of using kernel methods for graph comparison is attributed to refs. 44,45 - see also refs. 46,47 for a unified description and survey.

The approach is very general, as feature vectors can be defined in very different forms. A recent paper 48 , introducing an R/Python implementation, summarizes 14 different kernel types among the most popular ones: the majority of them are based, in different forms, on statistics on node/edge labels (thus they fall out of the scope of our work, as we do not assume labels on nodes/edges). Two of them are based on graphlet count, and the remaining on the comparison of random walks on the two graphs. But this list is by no means exhaustive, as many other proposals are found in the literature (e.g., ref. 49 defines a Laplacian graph kernel). Therefore, graph kernel methods can be considered as a general framework where diversified network features can be included. It follows that selecting the proper graph kernel for the problem at hand can be critical, also considering the non trivial computational requirements of this class of methods: we refer the reader to the discussion in ref. 47 .

#### Bayes’ modeling of a network population

The network comparison problem can be addressed using a Bayesian nonparametric approach. Durante *et al*. 50 proposed a mixture model to describe a population of networks, interpreted as realizations of a network-valued random variable, and in particular to infer the parameters of the probability mass function of such variable. This is not a distance-based method, since no explicit distance is defined to compare networks. Instead, the use of a Dirichlet process prior naturally yields a clustering of the input networks thanks to the discreteness of the resulting measure.

#### Persistent homology

Homology is an algebraic-topological measurement of the structure of an undirected unweighted network which, based on the number and dimension of cliques and cycles, exploits the information carried by the mesoscale structure of the network. The generalization to weighted graphs is possible by considering *persistent homology*, which tracks the evolution of cycles when a sequence of unweighted graphs is obtained by thresholding the network at distinct edge weights. This technique was used by Sizemore *et al*. 51 , where suitable quantities derived from persistent homology are jointly used as features to classify networks via hierarchical clustering, showing a superior performance with respect to using standard graph statistics. In the paper, however, no network distance is explicitly defined.

## Divergent Evolution

As the name implies, divergent evolution shows how species can change slightly over time and separate (diverge) into new forms. For example, in vertebrates like pigs, birds, monkeys and whales, the forelimbs have the same general sets of bones, but they have been modified over time so the animals can use their forelimbs in very different ways. Divergent evolution is studied on a larger scale such as how the current diversity of life on Earth evolved from the first living cells, to a smaller scale where natural selection caused humans and apes to evolve from a common ancestor.

The image above shows how the beak sizes and shapes of finches that live on the Galapagos Islands (Darwin’s Finches) have diverged over time in response to natural selection pressures from the competition for food.

## Implementation of the Stream and Assessment of Student Learning

We implemented the comparative biology stream during fall semester 2006 (F06) and spring semester 2007 (S07) using the instructional sequence described above. During F06, assessment of student learning in the comparative biology stream included a lab quiz that had a question about mapping characters and mirrored the work the students had done with the skeletons during week one of the stream. In addition, the LB144 final exam included a section on comparative biology. One of the extended response questions on this exam asked students to map the following characters onto a phylogenetic tree that had phylum names as terminal taxa: (a) true tissues (b) radial and bilateral symmetry (c) acoelomate, pseudocoelomate, and eucoelomate body plans and (d) protostome and deuterostome embryological development. This exam question was almost identical to the question/task completed as a lab team during week four of the comparative biology stream (Fig. 5).

During S07, assessment of student learning included a lab quiz. However, we also gave a stand-alone hour exam that included a section on comparative biology. Our (the teaching team’s) response to this exam was a sense that the students still did not “get it” with respect to mapping specified characters onto morphology-based trees. Therefore, in preparation for the final exam, we prepared an Animal Problem Study Guide that we handed out in class (not shown). We then asked the mapping question again on the final exam, but without the requirement to map the acoelomate, pseudocoelomate, and eucoelomate body plans (Fig. 5).

Student learning of tree-thinking apparently was better during S07 than S06. On the F06 final exam, the students scored an average of 3.86/6.00 (±1.66), or 64.3%, on the characteristic-mapping question (Fig. 5) during S07, the mean score was 4.45/6.00 (±1.76), or 74.2%. Among the possible reasons for the higher scores observed during S07 than F06 are the extra coaching and the second chance provided the students (hour exam and the final exam), and the easier question given on the final exam (not having to map acoelomate, pseudocoelomate, eucoelomate Fig. 5). We also do not know if there were differences in the student populations between the two semesters, whether additional instructional differences existed, or how well our assessment techniques accurately quantified student learning.

Our course evaluations at the end of the semester allowed us to obtain anecdotal student responses regarding the comparative biology lab stream. Several students commented that they really enjoyed the dissections. This comment makes sense given that many LB144 students are planning careers in medicine. Another sentiment expressed was that students appreciated the opportunity to demonstrate individual knowledge through the PowerPoint presentations.

Although we have presented here an informal preliminary analysis of student learning, formal quantitative assessment of student learning in the comparative biology stream is ongoing. We created and employed a Phylogeny Assessment Tool to assess prior knowledge (Pre-test) and learning outcomes (Post-test), and collected data from approximately 200 LB144 students during fall semester 2008. Data analyses are in progress and will form the basis of a separate manuscript (Smith and Cheruvelil, in preparation).

## Methods

### EP methods

Evolutionary Probability captures neutral expectations for observing an allele by using a Bayesian analysis of long-term evolutionary history of the sequence. Using a multi-species alignment and phylogenetic relationships among the sequences, Liu et al.’s method [1] first estimates the posterior probability of observing any allele in sequence of interest by using the prior knowledge of the relationship among sequences and the sequences themselves. For example, EP can answer the question: “what is the probability of observing an alanine residue at position 42 in the *human* beta globin protein (HBB), given the multiple sequence alignment for *HBB* in 46 vertebrate species?” To answer such a question, Liu et al.’s method assumes that the actual residue at position 42 in the human sequence is unknown, and produces probabilities for all alleles possible at the site (20 residues for amino acid sequence alignments).

Formally, EP of an allele at a sequence position in a given species in a tree is the weighted mean of a set of posterior probabilities <*PP*_{0}, *PP*_{1}, *PP*_{2}, ⋯ , *PP*_{n}> calculated from the sequence alignment and species phylogeny. *PP*_{0} is the posterior probability of observing a specific allele at a specific position in the focal species where the full dataset is used. Here 0 indicates no sequences are excluded. *PP*_{1} is the posterior probability of the same allele at the same position after excluding the sister species or group closest to the focal species. The 1 indicates that the first closest group to the focal species was excluded. In the phylogenetic tree in Fig. 9, this means that the chimpanzee lineage is excluded when computing *PP*_{1}. This process is repeated for the residual phylogeny, which results in fewer species in progressive pruning steps. The pruning stops when the tree has only one outgroup and the focal species. The number of pruning steps (*n*) depends on the tree topology and the number of sequences in the tree. Figure 9, shows a total of 15 pruning steps for the 46 vertebrate species phylogeny, with humans as the focal species.

Phylogenetic relationships of 46 vertebrate species used for calculating evolutionary probabilities (EP). Nodes ancestral to the focal species, human, are labeled with numbers that correspond to pruning steps in EP calculation algorithm (see Methods). Numbers in parentheses next to the species label represent the step at which the taxon is pruned from the tree. Each of the seven main species groups used in the taxon density sampling are colorized (including the outgroup, lamprey) and labelled

The weights of PPs used to calculate EP are the set of divergence times <*T*_{0}, *T*_{1}, *T*_{2}, ⋯ , *T*_{n}>, where *T*_{i} for all *i* ≥ 0 is the divergence time between the focal species and the closest related taxon in the phylogeny used for calculating *PP*_{i}. Then, using a standard weighted mean formulation:

Therefore, the weights for posterior probabilities are normalized times, and are thus unit-less.

The modified EP approach differs from the EP method of Liu et al. [1] in that the evolutionary relationships (phylogeny) of sequences in the given alignment and the divergence times among clades are both inferred from the sequence alignment itself. We suggest inferring such evolutionary relationships by using model-based methods, e.g., Maximum Likelihood under a suitable substitution model [13], which are known to be more accurate than the alternatives [14, 15]. In order to transform this phylogeny into a timetree, one may use a Bayesian method or a RelTime approach [16]. We selected RelTime, because its computational time requirements are orders of magnitude smaller [17]. Also, RelTime produces excellent relative times without requiring any calibration or other prior assumptions, as shown through extensive computer simulations [17, 18]. Additionally, the RelTime method has a strong theoretical foundation and produces results that are similar to those from Bayesian methods for empirical datasets [19,20,21]. These relative times can be directly used, because the weight function in the EP calculation effectively normalizes divergence times in the input, making relative and absolute times equivalent (see above). Thus, using either absolute times (as used in the Liu et al. application of EP) or relative divergence times (as used in this modification) in the calculations will produce identical results.

In the modified EP approach, however, we also used a modified weight for the EP calculations. Instead of the divergence time between the focal species and the closest related taxa, *T*_{i} is instead the evolutionary time span (ETS see “Evolutionary Time Span” section) of the protein in tree at stage *i*. This approach is different from the Liu et al. implementation of EP, where later pruning steps were given higher weights because divergence time between the focal species and the closest-related taxon increases in subsequent pruning steps. Here we decrease the relative contribution of later pruning steps because an amino acid present in a distant taxon is less likely to be neutral than one observed in a closely-related taxon [22]. The neutrality of an allele can be better estimated as information for more diverse and distant taxa are available at a site. As more taxa are included in a sample, a clearer picture of the results of natural selection can be gleaned.

We refer to the EP method where species relationships and divergence times used are known beforehand as the “original” EP method, and the EP method where species relationships and divergence times are both inferred as the “modified” EP approach.

### Data collection and analysis

We downloaded sequence alignments of 18,621 protein-coding gene orthologs in 46 vertebrate species from UCSC Genome Browser [23] (accessed 21 June 2016). Where duplicate isoforms of the same protein were found, we selected the alignment with the longest sequence. We found that the sequences for 230 human protein-coding genes (“proteins”, henceforth) differed by > 2% from RefSeq canonical sequences, so we excluded these from analyses. The remaining 18,391 sequence alignments were used to compute EP values for all tested approaches.

Missense variants used for evolutionary permissibility classification were acquired from the 1000 Genomes Project Phase III (1KG) dataset [8]. Single nucleotide variants (SNVs) in the 1KG dataset were mapped to human protein coding gene sequences retrieved from UCSC Genome Browser [23]. SNVs that resulted in missense changes were retained for analysis, while synonymous and nonsense changes were filtered out. In subsequent analyses, these missense SNVs were identified solely by resulting amino acid changes. We found 543,220 sites at which a missense mutation occurs in at least one of the 2504 individuals in the set of 18,391 proteins analyzed. For each protein, we computed amino acid EP values using MEGAX [24] under a Poisson model with a discrete Gamma distribution of rates (5 categories) that includes invariant sites (G + I). Other models could have been specified, but the estimates of EP were previously shown to be robust to the complexity of substitution model used [1]. For analyses where the phylogeny was presumed to be unknown, we first calculated maximum-likelihood trees in MEGAX using the same substitution models used in the EP calculation branch lengths were discarded and only the topology was used.

Our human disease dataset consists of 50,422 disease associated missense variants retrieved from the Human Gene Mutation Database (HGMD, http://www.hgmd.cf.ac.uk/ac/) [25]. Candidate Adaptive Polymorphisms (CAPs) were retrieved from http://mypeg.info/caps (accessed 21 June 2016). EP for each variant was calculated using the modified EP method described above.

#### Calculating ΔeForb

For a given protein, we quantified the proportion of incorrect inference under the modified EP method (ΔeForb). For each protein, we first determined the number of sites at which missense variants were found in the 1KG data set. At each site, we considered both segregating alleles (1KG reference allele and the alternate allele) and gave them eForb designation by using the EP values produced by the original EP method (retrieved from http://mypeg.info/ep accessed 21 June 2016). If such an eForb was not found to have EP < 0.05 when using the modified EP approach, then it contributed to ΔeForb fraction. A ΔeForb of 50% indicates that 50% of all alleles at missense sites, which were eForbs by the original EP method, received an EP > 0.05 by the modified EP approach.

### Evolutionary time span

A protein’s evolutionary time span (ETS) is the average of positional time spans (PTS) across all sites in a protein sequence alignment. PTS at a site is the total time along all branches in a tree for which a valid base (or residue, depending on whether nucleotide or protein sequence alignment is used) has existed in the evolutionary history of the site [26]. Alignment gaps and missing data in a multiple sequence alignment are not considered valid bases. To compute PTS for a site in a sequence alignment, the independently established timetree, or master timetree (used in the original EP calculation), is pruned such that only taxa that have a valid base at that site are retained. PTS is then simply the total time spanned by the resulting timetree (sum of times spanned by each branch) for that site. PTS will be a maximum for a site which has a valid base for all taxa in the master timetree.

Residue evolutionary time span (RTS) is the total time that a specific residue has been found in the evolutionary history of a site [27]. RTS is calculated by pruning the master timetree such that only taxa that possess the specified residue are retained. RTS is the total time spanned by the resulting timetree (sum of times spanned by each branch) of a residue at a site. A residue that is not found in any sequence at a site has RTS of 0. RTS for all amino acids at a site will sum to the PTS for that site. A relative residue time span is often more informative than simple RTS, because it accounts for the PTS of a site and allows for comparison between sites with different PTS.

ETS can serve as a proxy for the amount of sequence information available ETS that is close to the maximum indicates that there are few gaps in the sequence alignment, while ETS that is much lower than the maximum indicates a larger number of alignment gaps. PTS can convey similar information at the per-site level. Similarly, a small RTS means that the residue was found in a limited number of species and occupied that position for a limited amount of evolutionary time. In contrast, a large RTS means that the residue is commonly observed among species. Thus, time spans can be more informative to the properties of a sequence alignment as a relative value. So, here, we refer to all time span values as fractions of the maximum possible value of that measure (%ETS, %PTS, %RTS) i.e., %ETS is the proportion of a sequence alignment with no invalid bases covered by the ETS of the protein (ETS / maximum possible ETS), %PTS is the proportion of the time span covered by PTS for a site with valid bases for all species in the alignment (PTS / maximum possible PTS), and %RTS is the proportion of the PTS spanned by a specific allele (RTS / PTS).

### Tree distance

Branch-length distance [28] was used to quantify the error in inferred phylogenies, which were used in the modified EP analyses. The inferred tree was compared to the timetree used in the original EP method, but since the inferred tree produced relative time branch lengths, we first scaled the inferred tree such that its sum of branch lengths was equal to that of the original EP timetree. The branch-length distance, unlike simple symmetric differences or partition metrics, measures both differences in topology as well as branch length differences of the trees being compared. Such a measure is useful here because EP incorporates both species relationships (topology) and divergence times (branch lengths) into its calculations, so an ideal distance measure will capture differences in both of these properties.

### Taxon sampling

#### Sampling within clades

In our taxon “density sampling” experiments, the number of taxa included in each major clade of the 46 species vertebrate tree were varied (Fig. 9). We generated 100 replicate samples for one, two, three, and four taxa per clade (density) for seven clades (A-G, Fig. 9). Taxa were randomly sampled from these clades when generating replicate datasets, and humans were used as the focal species. For each analyzed clade density, the mean and standard error of EP were calculated for each residue, separately for original and modified approaches. Additionally, the mean ETS for all replicates was recorded for each clade density.

#### Sampling between clades

“Temporal sampling” iteratively increases the number of taxa distantly related to the focal species, human (Fig. 9). In each iteration, the next closest related taxon to the previous dataset is included. The first iteration requires a minimum of 3 taxa to analyze: human, chimpanzee, gorilla the second iteration added orangutan, the fourth added rhesus monkey, until the final iteration contained all taxa including the lamprey.

### Receiver operating characteristic (ROC)

We calculated true eForb and false eForb classification rates under various eForb thresholds (EP value below which an allele is considered evolutionarily forbidden 10 evenly spaced thresholds between EP < 0.01 and EP < 0.1) to determine the performance of the modified EP approach relative to the original EP method. For a given eForb threshold, we identified each eForb variant in the 1KG dataset based on EP values from the original EP method as the set of “condition positive”. 1KG variants that were not eForbs comprised the set of “condition negative” variants. For the same set of 1KG variants, we collected the set of eForbs identified across a variety of discrimination thresholds based on modified EP values as the set of “predicted condition positive” variants. Variants not predicted to be eForbs using modified EP values were the set of “predicted condition negative” variants. True(/false) eForb classification rates were calculated as the fraction of condition positive(/negative) variants that were correctly classified as eForbs(/not eForbs) when using the original EP values as the ground truth. ROC curves were generated for each of the eForb thresholds from 0.01 to 0.10, as described above.

## Internodes

By contrast, internodes are the sections of stem between nodes. If the nodes are the crucial “organs” of the plant, the internodes are the blood vessels carrying water, hormones, and food from node to node.

Usually, internodes seem long and provide spacing between nodes of many inches. However, some plants are notable for how close together their leaves, and thus their nodes, always are. Dwarf conifers have closely-spaced nodes. Yews and boxwoods, with their very dense leaves, also always have short internodes. This fact is why they can be sheared or pruned into any shape, including the special form of topiaries.

## How Parsimony Works

Starting with a set of species and a set of genetic traits, the parsimonious approach would be to look at which traits are shared between species. The tree is constructed by working through the possible relationships for each trait and selecting the option that has the fewest number of state changes. That is the intermediate ancestor species for each trait. The creation of the tree continues until it gets to the root, or common ancestor, for all of the species being mapped.

## 2.1 Reading Trees

A phylogenetic tree is an illustration depicting the hypothesized *degrees* of evolutionary relationship amongst a selected set of taxa (singular = taxon). The taxa are typically species, but can also be higher-level Linnaean groupings like genera or families. Alternatively, some phylogenetic trees depict relationships among individuals within a species (e.g., from geographically isolated populations). Regardless of their rank, the taxa depicted in a phylogenetic tree are often called terminal taxa , because they occur at the tips of the tree. They are sometimes referred to as "terminals" or "leaves."

Terminal taxa are connected by branches . The branches are the line segments that make up the tree. Branches come together at branching points called nodes . Each nodes represents a common ancestor shared by two or more terminal taxa.

Parts of a phylogenetic tree, including terminal taxa, branches, and nodes. Image by Jonathan R. Hendricks (Creative Commons Attribution-Sharealike 4.0 International license).

#### The Relatedness of Taxa

Remember that phylogenetic trees depict *degrees* of relationship among taxa. On a phylogenetic tree, more closely related terminal taxa are connected by shallower nodes (i.e., nodes nearer to the tips of the tree) and more distantly related terminal taxa are connected by deeper nodes (i.e., nodes nearer to the base of the tree).

Examine the figure above. In that figure, Taxon B and Taxon C are more closely related to one another that either is to Taxon A. We know this because Taxon B and Taxon C share a shallower node (the blue node) than then node that either shares with Taxon A (the yellow node). Another way of putting this observation is that Taxon B and Taxon C share a node (the blue node) that neither shares with Taxon A. Taking a broader view, we also know that Taxa A, B, and C are more closely related to each other than they are to Taxa D, E, F, G, and H. This is because Taxa A, B, and C share the yellow node in common, which does not link to Taxa D, E, F, G, or H. Taxa A, B, and C are linked to Taxa D, E, F, G, and H by a deeper node (the red node).

Branches may be rotated about nodes without any change in the hypothesized relationships depicted on a tree diagram. Convince yourself that the three trees below depict the exact same set of relationships among Taxa A-H. In each case, the only change that has been made is that branches and the terminal taxa connected to them have been rotated around nodes. Remember, the degree of relationship amongst various combinations of terminal taxa is indicated by the relative depth of the nodes that connect them.

Tree A. Image by Jonathan R. Hendricks (Creative Commons Attribution-Sharealike 4.0 International License).

Tree B. Image by Jonathan R. Hendricks (Creative Commons Attribution-Sharealike 4.0 International License).

Tree C. Image by Jonathan R. Hendricks (Creative Commons Attribution-Sharealike 4.0 International License).

The graphic style of phylogenetic trees varies. For example, the tree shown below depicts the exact same pattern of relationships among Taxa A–H as the three trees shown above.

A bracket-style phylogenetic tree. Image by Jonathan R. Hendricks (Creative Commons Attribution-Sharealike 4.0 International License).

Increasingly, it is common to see circle-shaped phylogenetic trees such as the one shown below. Circular trees are often used to illustrate relationships among members of major groups of extant organisms, and these trees may have many terminal taxa. Circular trees can be read in the same way as the trees shown above, because the relative depth of the nodes indicates the degree of relatedness among terminals.

A circle-shaped phylogenetic tree that depicts relationships among the major groups of living organisms (blue = bacteria green = Archaea pink = eukaryotes). This tree was created by Ivica Letunic and was retraced by Mariana Ruiz Villarreal (public domain).

#### Clades and Sister Groups

A clade (from the Greek *klados* = branch) is a group that includes an ancestor (node) and all of its descendants (all shallower nodes and terminal taxa that descend from that node) on a phylogenetic tree. If you pick a node on a phylogenetic tree, you can easily draw a circle around the clade that it defines, as in the tree below. While many clades have no formal names, some important clades are named in formal classification schemes (for more on clades and classification, see this later section).

A phylogenetic tree that illustrates the concept of clades. Note that clades are not mutually exclusive, but are nested within one another. Image by Jonathan R. Hendricks (Creative Commons Attribution-Sharealike 4.0 International license).

Clades are not mutually exclusive, but rather form nested sets on a tree. Thus, any given taxon can belong to many clades. For example, in the tree above, Taxon B belongs to three clades, a clade defined by Node 1, a more inclusive clade defined by Node 2, and an even more inclusive clade defined by Node 7. Taxon E belongs to four clades defined by Nodes 3, 5, 6, and 7, respectively. Try to figure out how many clades Taxon A and Taxon H belong to, and determine which nodes define each clade.

Sister taxa or sister groups are pairs of terminal taxa and/or clades that branch from a common node and are often considered closely related. Pairs of sister terminal taxa in the figure above include: B and C, E and F, and G and H. The clade defined by Node 3 (Node 3 + Taxon E + Taxon F) is sister to the clade defined by Node 4 (Node 4 + Taxon G + Taxon H). Terminal taxon A is sister to the clade defined by Node 1 (Node 1 + Taxon B + Taxon C). Try to find more sister pairs on the tree above.

#### The Meaning of Branch Lengths

Two things are implicitly occurring along the branches of a phylogenetic tree. The first is the passage of time. Deeper nodes are older than the shallower nodes to which the are connected. Thus, deeper nodes indicate both more distant relationships among the terminal taxa that they connect, as well a greater age for the most recent common ancestor of those taxa. The second thing is evolutionary modification, or the accumulation of hereditary genetic and/or structural changes along branches. While these changes are often not shown (mapped) directly on the branches, it is these inferred changes that underpin the construction and interpretation of a phylogenetic tree. When systematists talk about " branch lengths ," they are typically referring to the number of these changes.

So, does the length of the branches as depicted on a phylogenetic tree (in other words, the length of the branches on an actual diagram showing a hypothesis of evolutionary relationships) mean anything? The answer is: it depends.

Time and number of evolutionary changes may have no direct relationship to the relative lengths of branches as depicted on a tree. Many such trees are cladograms, or branching diagrams made using clastic methods, which have their roots in the work of Willi Hennig. (Note: The term " cladogram " is sometimes applied to any type of phylogenetic tree.) Often, diagrams that are drawn for general informational purposes to depict a consensus hypothesis of relationships amongst a group of taxa (for example, in a textbook) also do not have branches scaled to time or to number of evolutionary changes. In this type of diagram, the taxa will either be aligned at the branch tips or all branches will be about the same length (meaning that the taxa are not aligned).

A phylogenetic tree in which the branches are not scaled to time or evolutionary change. This diagram is a cladogram. From: Turner et al. (2017) *PLoS ONE* 12(2): e0169885. Used in accordance with Creative Commons Attribution 4.0 International (CC BY 4.0) License.

In other cases, branches on a tree are scaled so that they reflect the amount of evolutionary change (in other words, the number of modifications in characteristics) that has occurred. In this type of diagram, branch lengths will differ and taxa will not be aligned at the branch tips. Sometimes this type of tree is called a phylogram.

Example of a phylogenetic tree with branches scaled to depict number of evolutionary changes (a phylogram). Notice the the branch lengths on the diagram differ and the taxa are not aligned. From: Zapata et al. (2015) *PLoS ONE* 10(10): e0139068. Used in accordance with Creative Commons 0 1.0 Universal, Public Domain Dedication (CC0 1.0).

Trees may also have branch lengths that are scaled to time, making the relationship between relative node depth and time explicit. Typically, a time scale (relative and/or numerical) will be included beside the tree to indicate the timing of branching events. If a tree is explicitly scaled to time, it can be called a chronogram such trees are also sometimes called " time trees " (also time-trees or timetrees). If all taxa in a chronogram are extant (living), they will be aligned at the present.

Example of a phylogenetic tree with branches scaled to depict time (a chronogram). The gray bars at the nodes are error bars. From: Zhang et al. (2013) *PLoS ONE* 8(7): e70449. Used in Accordance with Creative Commons Attribution (CC BY) License.

Sometimes, extinct taxa may be included as terminals on a phylogenetic tree. If such a tree has branches scaled to time, extinct taxa will not be aligned at the present time. Rather, the branch tips for extinct taxa will end at the levels in time at which they went extinct, as shown below.

Examples of cladograms scaled to time and including extinct taxa as terminals. From: Wright and Stigall (2013) *PLoS ONE* 8(7): e68353. Used in Accordance with Creative Commons Attribution (CC BY) License.

Please explain in more detail

I'm sorry, but in my opinion, you are wrong. I'm sure. I propose to discuss it. Write to me in PM.

I fully share your opinion. I think it is a good idea.

You must tell you have been misled.