Information

User friendly phylogenetic tree editing software


Background: I have Hackett 2009 (from birdtree.org) phylogenetic tree and I need to reorder it according to Prum 2015. I also work only with some species relevant to my project, not the entire tree.


I currently use Mesquite, and it is absolute purgatory. Doing any reordering in Mesquite takes insane amount of time. I'd also like to add higher taxon information to each species (like order) to help my my orientation in the data. I understand some of Mesquite's shortcomings might be shortcomings of Nexus format. As I don't need to analyse the data myself, I can work in some better format like PhyloXML as long as it's possible to export back to Nexus format in the end.

Is there any way to do this? Some software with better visual editor (and overall better UX) or other method to do this in effective manner? I find it hard to believe that much bigger data is handled in such clumsy way.


Yes, I think that you should consider using MultiSeq which is a package of Visual Molecular Dynamics (VMD). You can use visual molecular dynamics to make phylogenetic trees based on sequence/structural/other statistical means.

If you want to check it out and see if it works for you, then install visual molecular dynamics: http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=VMD

and try out these tutorials under Bioinformatics from the distributors:

Http://www.ks.uiuc.edu/Training/Tutorials/

Note: VMD doesn't work very well in 64-bit Windows operating systems if that's what you use. To get it to work well, I suggest you install virtualbox from Oracle, and install your favorite free open-source Linux operating system into virtualbox or into your guest linux machine on your host machine.

I suggest installing your guest Linux OS on virtualbox with 1/2 the ram of your host physical computer allocated, and like 100 GB of space for when you discover that you love LINUX so much and have lots of information on there.

To full-screen your guest Linux OS, you need to install guest additions.

My favorite Linux OS is Kali Linux.

To install guest additions on Kali Linux and most linux guests on VirtualBox, you have to do this:

But as I've discovered you have to upgrade Kali Linux to Kali Linux Rolling to get Kali Linux to fullscreen. How you do that is via this:

And you probably shouldn't do your work as user "root" the default user. You have too much power so to increase security create a new user on Kali-Linux:

If you are already using linux, then you won't have any problem with getting VMD to work. But by the way, you have to install VMD from the command terminal in linux and this is how you do it.

The instructions as for how this guy does it (and sorry he doesn't speak English) are in the README file when you download VMD.

Sorry, apparently, I don't have a high enough reputation to cite you all the links. I'm sorry that I had to remove them.


I had good experiences with Archaeopteryx. This is a java package so it should be easy to run but I recommend the BioLinux if you don't want to spend too much time installing and get straight to the biology. You can run it as a virtual machine or install it side by side. It is getting a bit older but everything works just out of the box and there is a wide array of tree editing tools installed. Just look through the package list.


PhySpeTree: an automated pipeline for reconstructing phylogenetic species trees

Phylogenetic species trees are widely used in inferring evolutionary relationships. Existing software and algorithms mainly focus on phylogenetic inference. However, less attention has been paid to intermediate steps, such as processing extremely large sequences and preparing configure files to connect multiple software. When the species number is large, the intermediate steps become a bottleneck that may seriously affect the efficiency of tree building.

Results

Here, we present an easy-to-use pipeline named PhySpeTree to facilitate the reconstruction of species trees across bacterial, archaeal, and eukaryotic organisms. Users need only to input the abbreviations of species names PhySpeTree prepares complex configure files for different software, then automatically downloads genomic data, cleans sequences, and builds trees. PhySpeTree allows users to perform critical steps such as sequence alignment and tree construction by adjusting advanced options. PhySpeTree provides two parallel pipelines based on concatenated highly conserved proteins and small subunit ribosomal RNA sequences, respectively. Accessory modules, such as those for inserting new species, generating visualization configurations, and combining trees, are distributed along with PhySpeTree.

Conclusions

Together with accessory modules, PhySpeTree significantly simplifies tree reconstruction. PhySpeTree is implemented in Python running on modern operating systems (Linux, macOS, and Windows). The source code is freely available with detailed documentation (https://github.com/yangfangs/physpetools).


Mavric 0.8.3

:: DESCRIPTION

Mavric is a python module for the manipulation and visualization of phylogenetic trees. It is also a recursive acronym for Mavric Visualizes Rick’s Cladograms :) It aims to be a user-friendly tool for manipulating phylogenetic trees on *NIX-like systems, especially Linux. As such it complements other phylogeny programs like those in the PHYLIP package, which for all their strengths currently lack a nice graphical interface.

:: SCREENSHOTS

:: REQUIREMENTS

:: MORE INFORMATION


A biologist’s guide to Bayesian phylogenetic analysis

Bayesian methods have become very popular in molecular phylogenetics due to the availability of user-friendly software for running sophisticated models of evolution. However, Bayesian phylogenetic models are complex, and analyses are often carried out using default settings, which may not be appropriate. Here we summarize the major features of Bayesian phylogenetic inference and discuss Bayesian computation using Markov chain Monte Carlo (MCMC) sampling, the diagnosis of an MCMC run, and ways of summarizing the MCMC sample. We discuss the specification of the prior, the choice of the substitution model and partitioning of the data. Finally, we provide a list of common Bayesian phylogenetic software packages and recommend appropriate applications.

Bayesian phylogenetic methods were introduced in the 1990s 1,2 and have since revolutionized the way we analyse genomic sequence data 3 . Examples of such analyses include phylogeographic analysis of virus spread in humans 4,5,6,7 , inference of phylogeographic history and migration between species 8,9,10 , analysis of species diversification rates 11,12 , divergence time estimation 13,14,15 and inference of phylogenetic relationships among species or populations 13,16,17,18,19,20 . The popularity of Bayesian methods seems to be due to two factors: (1) the development of powerful models of data analysis and (2) the availability of user-friendly computer programs to apply the models (Table 1).


Descriptive Sequence Analysis

RNA and Protein Secondary Structure Prediction and Computation of Minimum Folding Energy

DAMBE uses the Vienna RNA Secondary Structure library ( Hofacker 2003) to predict secondary structure of RNA sequences and to compute their minimum folding energy (MFE). It features graphic display of secondary structures ( supplementary fig. S2 , Supplementary Material online). Several studies have used MFE from DAMBE to study the relationship between N-terminal of mRNA and protein translation (e.g., Xia and Holcik 2009 Zid et al. 2009 Xia et al. 2011). DAMBE uses hidden Markov model for predicting protein secondary structure based on training sequences with experimentally determined protein structure ( Xia 2007b, p. 109–132).

Improved Codon Usage Indices

Codon usage bias reflects the joint effect of mutation bias and tRNA-mediated selection ( Ikemura 1981 Xia 1996, 1998a, 2005, 2008, 2012c Xia et al. 1996, 2007 Carullo and Xia 2008 Palidwor et al. 2010 Ran and Higgs 2012). DAMBE implements improved versions of widely used indices of codon usage bias, including the gene-specific codon adaptation index ( Sharp and Li 1987 Xia 2007c) and the effective number of codons (Nc, Wright 1990 Sun et al. 2012), as well as the codon-specific relative synonymous codon usage (RSCU). These improved codon bias indices have contributed to the discovery of modified tRNA pool for translating HIV-1 late genes ( van Weringh et al. 2011), the effect of poly(A) tracts at yeast 5′-untranslated region (5′-UTRs) ( Xia et al. 2011), and the elucidation of the function of +4G in the Kozak consensus in mammalian mRNAs ( Xia 2007a).

Nucleotide Skew Plots

The two DNA strands are often subject to different mutation mediated by different DNA replication mechanisms and coding sequence bias. Nucleotide skew plots can often provide hints about mutation and selection operating during the evolutionary process ( Lobry 1996 Marin and Xia 2008 Xia 2012a, 2012c). One main problem with the conventional nucleotide skew plots is the choice of the sliding window size ( fig. 2). A window size too small will include too much noise and obscure interesting patterns, and a window size too large will often fail to identify precisely the point where abrupt changes of nucleotide composition occurs (which is typically associated with the origin and termination of DNA replication). DAMBE defines the optimal window size as the one that maximizes the area enclosed by the skew curve and the horizontal line specified by the global skew ( fig. 2). The empirical justification of such a definition is that the site where the skew curve changes polarity is always very close to experimentally verified origin and termination of DNA replication in bacterial genomes. Users can specify their own window size and step size.

Skew plots of the Bacillus subtilis genome at three different window sizes, with the skew curve colored in red having the optimal window size. The horizontal line is the global GC skew computed from the entire genome.

Skew plots of the Bacillus subtilis genome at three different window sizes, with the skew curve colored in red having the optimal window size. The horizontal line is the global GC skew computed from the entire genome.

Protein Isoelectric Point Profiling

Protein isoelectric point (pI) is important for understanding interactions between proteins and other cellular components because many of such interactions are mediated by electrostatic interactions, for example, a positively charged enzyme is attracted to its negatively charged substrate. DAMBE computes theoretical protein pIs by an iterative algorithm ( Xia 2007b, p. 207–219). Empirical data based on protein pI from the acid-resistant gastric pathogen, Helicobacter pylori, have been used to test the three key evolutionary hypotheses: the preadaptation hypothesis, the exaptation hypothesis, and the adaptation hypothesis ( Xia and Palidwor 2005). pI from DAMBE has also been used to study the adaptive evolution of the matrix extracellular phosphoglycoprotein in mammals and the implication of its change on protein folding ( Machado et al. 2011). DAMBE used computed pI in its in silico 2D gel where input protein sequences are displayed on an in silico gel based on their charge and molecular weight ( Xia 2007b, p. 207–219). Deviation of the observed protein location on the gel from the in silico prediction indicates posttranslational modification.

Plot Amino Acid Properties along the Protein Sequence

Amino acids (AAs) are characterized by size, charge, hydrophobicity/polarity, and their tendency to form α helices and β sheets. Plotting these properties along the protein sequence can often shed lights on local structures and functional domains. For example, DNA or RNA binding domains are typically characterized by a stretch of positively charged AAs such as lysine, arginine, and histidine, whereas transmembrane proteins typically contain hydrophobic domains ( fig. 3). The presence of these domains creates structural heterogeneity and represents a major source of rate heterogeneity in nonsynonymous substitutions among sites ( Xia 1998b Xia and Li 1998), which can often bias phylogenetic estimation. Several homologous sequences can be plotted jointly for one to visualize how AA substitutions lead to changes in protein phenotype ( fig. 3). DAMBE’s function for plotting these AA properties along protein sequences is accessed by clicking “Graphics|amino acid properties along sequences."

Hydrophobicity plot for human (NP_000530.1) and avian (Emberiza bruniceps: AFK10338) rhodopsin with seven transmembrane domains (peaks). The weak 7th peak is due to a relatively short α-helix. Output from DAMBE. A sliding window of 12 AAs is used.

Hydrophobicity plot for human (NP_000530.1) and avian (Emberiza bruniceps: AFK10338) rhodopsin with seven transmembrane domains (peaks). The weak 7th peak is due to a relatively short α-helix. Output from DAMBE. A sliding window of 12 AAs is used.

Nucleotide, Dinucleotide, AA, and Di-AA Frequencies

These simple frequencies not only serve as excellent entry point for teaching molecular evolution but can also lead to significant biological insights on spontaneous mutation during the evolutionary process ( Xia et al. 1996, 2006 Xia 2003, 2012a, 2012c Xia and Yuen 2005). For example, Mycoplasma genitalium has much lower genomic CpG dinucleotide frequencies than M. pneumoniae, but differential CpG-specific DNA methylation has been excluded as an explanation because neither species has any CpG-specific methyltransferase. It was found that their sister species, M. pulmonis, as well as several other deeper-rooted relatives, have CpG-specific methyltransferases and have even lower CpG dinucleotide frequencies. This restores DNA methylation as an explanation for variation in CpG frequencies between M. genitalium and M. pneumoniae. That is, the common ancestor of M. genitalium and M. pneumoniae lost the CpG-specific methyltransferases, and both daughter lineages began to rebound in CpG frequencies. Because M. pneumoniae has evolved much faster than M. genitalium, its CpG frequency has rebounded to a much higher level than M. genitalium ( Xia 2003). Similarly, di-AA frequencies among proteomes from diverse array of organisms have revealed constraints of AA by their neighbors ( Xia and Xie 2002), and experimental evolution has shown that Pasteurella multocida cultured at increasing temperature for over 14,400 generations decreased genomic GC ( Xia et al. 2002), contrary to the conventional hypothesis that genomic GC should increase with increasing environmental temperature.


Learning to Become a Tree Hugger

Amy Maxmen
Aug 1, 2011

Output from a BEAST analysis viewed in the Fig Tree program showing the inferred phylogenetic relationships among >300 ant samples from around the world. CORRIE SAUX MOREAU, FIELD MUSEUM OF NATURAL HISTORY

C onstructing an evolutionary tree can seem as unappetizing as filing taxes to those not fluent in computer-speak. But, alas, learning how one organism relates to another is often a necessary first step in approaching biological questions, be they about the evolution of drug-resistant strains or the origin of body parts. Advanced software for aligning genetic or protein sequences and constructing phylogenies exists, but most programs require entering lines of computer script. Richard Ree, an evolutionary biologist at the Field Museum of Natural History in Chicago, explains that the scant commercial interest in developing phylogenetics software has forced biologists to largely write programs on their own. &ldquoAs a result, the user interface tends to suffer because we don&rsquot.

But fear not: point-and-click tree-building and tree-visualization programs do exist—and they might be all you need to get where you’re going if phylogenetics isn’t your long-term calling. As a service to biologists with deep ideas but a phobia of Java-Script and “R,” The Scientist presents a tour of free software for aligning sequences, building phylogenies learning about evolution, and showing off a clear, visually pleasing final tree in presentations and publications.

How do I prep sequences for comparison?

The first step of any DNA or protein sequence comparison is to align sequences so that homologous nucleotide or amino acid positions line up across taxa. After you’ve gotten reliable DNA or protein sequences, you’ll need to convert each sequence into a text-based format called FASTA, if it’s not already in that style. To do so, just copy and paste your sequence into any word-processing document, then give the sequence an identifying label that begins with “>” and ends with a space. Insert the sequence after the space. If it’s a protein, it should look something like this: >gi|5524211|gb LCLYTHIGRNIYYGSLP LYSETWNTGIMLLLITMATAFMGY

If you’re adding sequences from GenBank, just download them in FASTA format and copy and paste them into the same file. Save the file of all your FASTA sequences as a .txt file.

A popular workhorse for alignment is Clustal, but there are many others. Platforms such as SeaView drive various alignment and phylogeny programs, including Clustal, and make them easier by simplifying them to their most basic features.“These online resources take some of the difficulty out of running particular programs, which is half the battle,” says Corrie Moreau, a Field Museum biologist who specializes in ant evolution.

To use Clustal via SeaView, open your .txt file in SeaView. Your sequence will appear in the left pane and the corresponding sequences in the right pane. Click Align ? Alignment options and select Clustal (SeaView drives the version ClustalW2). Next click Align ? Align all. A window showing the progression of the alignment procedure will appear. Save the completed alignment as a NEXUS file. You’re now ready to make a tree.

How do I construct a phylogeny that will tell me when organisms evolved?

Before jumping into one of the many phylogeny programs available, think about what you ultimately want to know. If you simply need a tree of relationships, then maximum-likelihood programs like RAxML, parsimony programs like TNT, or Bayesian probability programs like MrBayes will do the job. Although these three types of programs use different mathematical methods to analyze evolutionary relationships, the resulting trees should be quite similar. While some phylogeneticists adhere to a single method, many biologists prefer to confirm their work by using two or three. Web-based platforms, like SeaView, make some of these programs and others simpler to use, but be prepared to consult the program manual.

If you want to assess when organisms evolved, you’re in luck, because the phylogeny program BEAST makes that task less daunting. Moreau’s lab uses BEAST because it can incorporate fossil evidence, geologic data, and known mutation rates to estimate species relationships and divergence times simultaneously.

With the BEAST folder open, double-click on BEAUti, BEAST’s graphical user interface. In BEAUti , select File ? Import Alignmentand select your NEXUS formatted alignment. What you do next depends on how you want to measure time: via fossils, geology, and/or mutation rate. Moreau uses fossils and geology to set age limits. “If I have a fossil and I know it belongs in the same group as some of my ants, I tell BEAST that group of ants must be at least as old as the fossil,” she explains. “Or if a group of ants is endemic to an island, I know that group can’t be older than the island.” Alternatively, if a gene you’ve sequenced to create your phylogeny has a known rate of mutation, BEAST can use it to estimate when each taxon originated.

To enter fossil or geological information, click on the Priors tab and highlight the group of taxa related to the fossil, as well as the organism most closely related to this group. Enter the age of the fossil or geological cue (e.g., the age of the island) into the section labeled “TMRCA” (The Most Recent Common Ancestor). To enter a known or estimated mutation rate, click the Clock Model tab, select Strict Clock and insert the rate. For help or to explore other functions, check out online tutorials or the BEAST user group, which is monitored by the developers who wrote the program.

After you save your settings as an XML file, go back to the BEAST folder, open BEAST and select Run. When the program has finished running, import the file into TreeAnnotator (also in the BEAST folder). BEAST generates many plausible trees, each with an associated probability, since it’s impossible to determine the tree with 100 percent certainty. As a result, the data file generated directly from BEAST is too large. TreeAnnotator singles out one representative tree and annotates it with information summarized from other probable trees. For example, if a large proportion of the plausible trees agree on a relationship between A and B, it will indicate that the relationship between A and B is well supported. Save this tree as a .tree file. Next, open your .tree file in FigTree. Here you can arrange other outputs of the program, such as divergence dates (with their corresponding error bars). Save this tree as a NEXUS file. Among other information within that file, a line full of parentheses (such as orangutan(chimp(human))) will encode your tree in a format known as Newick, which phylogeny-related programs universally understand.

How do I use my phylogeny to learn about the evolution of features?

Now that you have a tree, you’re ready to test ideas about how or why those organisms diversified. Did a horned beetle give way to many horned species, or did these horned species arise independently from a beetle with a smooth noggin? This might sound like a simple question, but when you have 100 taxa and 8 character states (e.g. tall horn, jagged horn), you’ll need to infer the state of the ancestor between each pair of organisms, down to the root of the tree. For this problem, Ree recommends Mesquite, a graphics-oriented program that handles questions of character evolution, patterns of species diversification, inquiries about population genetics, and more.

Open Mesquite and click on File ? New. Indicate how many taxa you have in your tree, and at the prompt, create a character matrix. If the features you’d like to enter are discrete, click Categorical Matrix. If they are continuous, like height, click Continuous Matrix. Next enter your taxa and character states in the matrix provided. If it’s a measurement, enter in the numbers without units. Finally, upload the NEXUS file containing your tree.

As with building trees, you can estimate ancestral character states with parsimony or maximum likelihood. Parsimony will find the solution with the fewest number of changes. (This is your only option with continuous characters.) Do a parsimony analysis by clicking on Analysis ? Trace Character History ? Parsimony Ancestral States. The inferred ancestral states will then appear at the nodes.

Maximum likelihood, on the other hand, takes into account branch lengths when determining an ancestral state. The program will be less certain about the state of an ancestor connecting two species that split millions of years ago. A small pie chart at each node indicates this probability. And lower probabilities will reverberate at later nodes. To run a maximum likelihood analysis, go to Trace ? Reconstruction Method ? Likelihood Ancestral States.

Presenting the tree: What’s life without style?

Anyone who’s looked at trees with more than 30 taxa knows they aren’t simple to read. Dozens of parallel and perpendicular lines blend, and it’s hard to see the story they tell. University of Arizona phylogeneticist Michael Sanderson recommends Dendroscope to make sense out of what you see.

Begin by uploading the NEXUS file containing your tree into Dendroscope. On the tool bar you’ll notice icons for different sorts of trees: ones with diagonal connections, with branches radiating out from the center, with the main groups separated by long branches, and others. Click on each of these to see what your tree will look like in each format—the relationships stay the same.

If you’d like to highlight one group of taxa, press the shift key and click on a branch within that group. This will change the color of these branches. Open the Format window, and under Edit, change the font, color, and width of lines. Once you like what you see, export the file as a JPEG, PDF, GIF, or another format.

For a killer 3-D presentation, upload your NEXUS file into a visualization program called Paloverde, and click on the icon illustrating the form of 3-D tree you prefer. Paloverde works well for visualizing moderately large trees, between 100–2,500 taxa.

Alternatively, if you have reliable information about where each organism was collected, you can spread your phylogeny over the surface of the globe with GeoPhylo, a program that projects phylogenies over Google Earth or NASA World Wind (you’ ll have to download these programs first). Copy and paste the parenthetical line from the NEXUS file generated by your tree-building program into the Rooted Tree Box in GeoPhylo. Under the Coordinates and Data tab, enter the longitude and latitude where each taxon was found. Click Run, and your tree will be displayed over the Earth.

Andrew Hill, a graduate student at the University of Colorado, Boulder who developed GeoPhylo with his advisor, Robert Guralnick, used it to explore the spread of avian influenza. First, they constructed a phylogeny of influenza viruses, particularly those with drug resistance-conferring mutations. They then projected the tree over the globe, to see how those lineages arose and spread around the world.


2 METHODS

The program comes with pre-compiled integrated versions of RAxML for the major operating systems (MacOS, Windows, Linux), including the PTHREADS and SSE3 versions (Stamatakis, 2014 ) allowing the user to run faster analyses using parallel computing, when multiple CPUs are available. Pre-compiled versions of RAxML-NG are provided for MacOS and Linux. A Windows version will be added when available from the RAxML-NG development team.

raxmlGUI 2.0 is structured into five different sections, INPUT, ANALYSIS, OUTPUT, RAXML and CONSOLE (Figure 1). The left panel with the three first sections lets the user load input files, setup the analysis, define substitution models and partitions, choose output path, among other features. The right panel lets the user select RAxML version, see and run the command resulting from the input on the left panel, and see the progress and output from RAxML in the integrated console.

2.1 Basic setup

raxmlGUI 2.0 supports alignment files in different formats commonly used in phylogenetic analyses: extended PHYLIP, FASTA, NEXUS and MEGA (example files are available in the program's repository). Upon loading an alignment, the program parses the names attributed to each sequence (e.g. the species name) and creates a list of taxa in the Outgroup menu button, which can be used to root the tree based on a user-defined outgroup. Note that maximum likelihood trees can always be re-rooted after the analysis using tree-viewing software such as FigTree (Rambaut, 2012 ).

Phylogenetic analyses can be run based on different types of data: nucleotide sequences (DNA, RNA), amino acid sequences, discrete binary and multi-state characters (e.g. used for descriptions of morphological data). Since each data type requires a specific class of substitution models, raxmlGUI 2.0 automatically recognizes the data type from the loaded input file and provides the user with a drop-down menu showing all the substitution models compatible with the alignment.

2.2 Analytical pipelines

Analytical pipelines readily implemented in raxmlGUI 2.0 include a maximum likelihood search of the best tree, followed by a bootstrap analysis. Bootstrap support values are then drawn onto the maximum likelihood tree. After loading the alignment file and setting up the preferred substitution model (options for model testing directly from raxmlGUI 2.0 are described below), launching the default analysis only requires hitting the Run button on the right panel. Other options are available on the analysis panel to set the number of bootstrap pseudo-replicates. The analysis progress can be monitored in the console section of raxmlGUI 2.0. When the analysis is completed, a list of output files will be available in the output section. Clicking on the file names will open the files in the user's default program (e.g. FigTree for tree files). The most important output of this analysis is named ‘RAxML_bipartitions.input.tre’ (where input is by default the file name of the alignment) and includes the maximum likelihood tree topology and branch lengths with labels reporting the bootstrap scores for each node (bipartition) in the tree. All output files are by default saved in the same directory of the input file.

Several other types of analysis are available in raxmlGUI 2.0. Some analyses integrate multiple calls to RAxML to simplify the user experience in a single pipeline. For instance, the ML + thorough bootstrap option launches, in a simple click, a sequence of three RAxML calls to (a) infer the maximum likelihood tree through a user-defined number of independent searches (b) run a user-defined number of thorough non-parametric bootstrap replicates and (c) draw the bootstrap support values onto the maximum likelihood tree.

2.3 Automatic concatenation of alignments and partitions

An important feature of raxmlGUI 2.0 is the automated concatenation and partitioning of alignments, which simplifies the analysis of multiple genes or combination of different data types, for example, amino acid sequences and morphological data. After loading the first alignment, the user can add new ones to concatenate them into a single analysis. Upon loading additional alignments, raxmlGUI 2.0 performs the following tasks:

  • Parse the data to determine the data type (nucleotides, amino acids, multistate).
  • Parse the taxa names to make sure the concatenation of sequences occurs across matching taxa even if they are listed in different order among input files.
  • For any mismatch between taxa of different partitions, give option to automatically create sequences of missing data in the concatenated alignment or drop taxa with missing sequences in any partition.
  • Set default partitions for the new alignments and re-compute the concatenated partition.

These features facilitate the concatenation of different alignment files, the creation of the partition files and the generation of sparse matrices resulting from the combination of datasets with different and only partly overlapping taxonomic coverage. These tools also reduce the probability of errors stemming from manually merging sequences by matching taxa names. Additionally, raxmlGUI 2.0 provides an intuitive interface to create partitions within a single alignment file, including the possibility to specify codon based evolutionary models for coding nucleotide sequences (Figure 2). Finally, the user can load their own partition files, which must be provided in a RAxML compatible format (Figure 1).

2.4 Support for both RAxML 8.x and RAxML-NG

In addition to RAxML 8.x, raxmlGUI 2.0 adds support for RAxML Next Generation (Kozlov et al., 2019 ), which provides new options and improved performance for very large datasets, which are typical for the analyses of genomic data. Among the novel methods implemented in RAxML-NG, and available through raxmlGUI 2.0, is the Transfer Bootstrap Expectation algorithm to quantify topological support for a tree (Lemoine et al., 2018 ). This algorithm has been shown to outperform the traditional bootstrap analysis (Felsenstein, 1985 ) when applied to large phylogenetic trees (thousands of tips). The user can select which version of RAxML they want to run from the GUI, and the available settings are automatically updated for the specific version. For guidelines of which RAxML version to use for particular objectives and datasets, please refer to Kozlov et al. ( 2019 ).

2.5 Model testing

One of the advantages of RAxML-NG over RAxML is its increased range of available substitution models for nucleotide and amino acid data. This feature also allows users to define different substitution models for each partition, for example, when analysing concatenated genes. To facilitate the use of these features, we implemented a model testing feature in raxmlGUI 2.0 that allows the user to select the best substitution model based on the corrected Akaike Information Criterion (AICc Burnham & Anderson, 2002 ). Model testing is carried out using the program ModelTest-NG (Darriba et al., 2019 ), and is seamlessly integrated within raxmlGUI 2.0 through the OPTIMIZE button (Figure 1). The test can be run separately for each partition and the best model will be specified automatically for the following analysis. As for RAxML-NG, ModelTest-NG is currently provided for MacOS and Linux, whereas Windows support will be added as soon as a compatible version is made available by the ModelTest-NG development team.

2.6 Performance and implementation

There is no performance difference between running RAxML on the command line and running it from the GUI as raxmlGUI 2.0 just forwards all settings as parameters to the command line version of RAxML and runs that as a separate process. raxmlGUI 2.0 also supports a tabbed interface for running multiple analyses in parallel (Figure 1).

raxmlGUI 2.0 is built with Electron (Github Inc., 2020 ), a framework for creating cross-platform desktop applications using web technologies such as JavaScript, HTML and CSS. The user interface is built with Material-UI (Material-UI, 2020 ), a React (Facebook Inc., 2020 ) user interface framework with components that implement Google's Material Design (Google, 2020 ). The Electron base improves the portability and compatibility across platforms and operating systems compared to the previous version of raxmlGUI that uses an obsolete Python 2.x codebase. The installation is extremely simple and does not require any additional external libraries or dependencies, nor does it require admin rights on the machine.

On machines featuring multiple CPUs (i.e. most desktop and laptop computers) the GUI allows users to easily use RAxML's powerful parallel computing, which can drastically speed up the analyses. raxmlGUI 2.0 includes pre-compiled versions of the PTHREAD version of RAxML and a dropdown menu button to specify the desired number of CPUs allocated for the analysis.


Pôle Rhône-Alpes de Bioinformatique Site Doua

Version 5.0.4

NEW: seaview performs reconcilation between gene and species trees using Treerecs version 1.2
NEW: bootstrap support optionally with the "Transfer Bootstrap Expectation" method
NEW: trimming-rule to shorten long sequence names in phylogenetic trees
NEW: 64-bit version for the MS Windows platform
NEW: multiple-tree windows
NEW: seaview uses PHYLIP v3.696 to compute parsimony trees
NEW: seaview can be run without GUI using a command line
NEW: seaview drives the PhyML v3.1 program to compute maximum likelihood phylogenetic trees.
NEW: seaview drives the Gblocks program to select blocks of conserved sites.

SeaView is a multiplatform, graphical user interface for multiple sequence alignment and molecular phylogeny.

  • SeaView reads and writes various file formats (NEXUS, MSF, CLUSTAL, FASTA, PHYLIP, MASE, Newick) of DNA and protein sequences and of phylogenetic trees.
  • SeaView drives programs muscle or Clustal Omega for multiple sequence alignment, and also allows to use any external alignment algorithm able to read and write FASTA-formatted files.
  • Seaview drives the Gblocks program to select blocks of evolutionarily conserved sites.
  • SeaView computes phylogenetic trees by
    • parsimony, using PHYLIP's dnapars/protpars algorithm,
    • distance, with NJ or BioNJ algorithms on a variety of evolutionary distances,
    • maximum likelihood, driving program PhyML 3.1.

    Screen shots of the main alignment and tree windows. Dialog window to perform Maximum-Likelihood tree-building.
    On-line help document.Old seaview version 3.2

    Download SeaView

    MacOS X ready for MacOS 10.3 - 11.0
    32-bit Linux on x86 64-bit Linux on x86_64
    MS Windows self-extractible archive
    Solaris on SPARC
    Source code (also available in ftp://pbil.univ-lyon1.fr/pub/mol_phylogeny/seaview/archive/) Change log

    Note for MS Windows users: The downloaded file (seaview5.exe) is a self-extracting archive: open it, and it will create a folder called seaview5 on your computer. The window that appears when you open seaview5.exe allows you to choose where to place the seaview5 folder. This folder contains the seaview program, an example data file, a .html file, and 5 other programs (muscle, clustalo, phyml, Gblocks, treerecs) that seaview drives. This folder contains also seaview32bit.exe, a 32-bit version of the seaview program. If you run a 32-bit version of MS Windows (typically Windows XP), you can discard seaview.exe and use seaview32bit.exe.

    Note for Linux/Unix users: The downloaded archives contain the seaview executable itself, an example data file, a .html file, and 5 other programs (muscle, clustalo, phyml, Gblocks, treerecs) that seaview drives. These 5 programs and the .html file can either be left in the same directory as seaview, or be put in any directory of your PATH.

    Note for macOS users: Right after decompression of the .zip file, it can be necessary to ctrl-click the seaview icon and select "Open" in the appearing menu. Once this has been done, seaview can be opened normally by double-clicking its icon.

    Reference

    If you use SeaView in a published work, please cite the following reference:

    Gouy M., Guindon S. & Gascuel O. (2010) SeaView version 4 : a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Molecular Biology and Evolution 27(2) :221-224.


    The authors acknowledge the contributions of the Arbor team, Luke Harmon of the University of Idaho, Chelsea Specht of the University of California at Berkeley, Robert Thacker of the University of Alabama at Birmingham, Jorge Soberon of the University of Kansas, Wes Turner of Simquest, Inc., and Jeff Baumes of Kitware, Inc. We are particularly indebted to Luke Harmon for his insightful editing of this paper and to his research group for their contributions to the formative evaluations of the user interface presented here. The authors also acknowledge an anonymous reviewer of a previous version of this paper for suggestions to improve future versions of PhyloPen, including converting the annotations to text using handwriting recognition, reorganizing the tree or collapsing a part of it to take advantage of the regained space, and group deletions of identical annotations passed up or down the tree.

    Software engineer by day, aspiring PhD student by night. I graduated from the University of Central Florida with a B.S. in Computer Science in 2011 and an M.S. in Computer Science in 2012. I work at CG Squared (CG2), Inc., a Rɭ company that develops commercial LIDAR visualization software and is also a defense contractor. I am a PhD student at UCF, with Dr. Hassan Foroosh as my current advisor. My PhD research is currently compressive sensing in the field of computer vision, but I also have experience in software integration, accelerated processing, and visualization with 2D and 3D data and sensors (most notably in LIDAR point processing), as well as computer graphics and traditional, pen-and-touch, and 3D (a la Kinect) user interface design and implementation.

    Dr. Lisle received his Ph.D. in Computer Science from the University of Central Florida in 1998 and has focused on developing visualization technology primarily for medical and biological applications since then. Prior to completing his degree, Dr. Lisle developed custom hardware and software for applications in high-performance computer graphics while working as a staff member of Silicon Graphics, General Electric, and the University of Central Florida.

    Charles Hughes is a Pegasus Professor in the Department of Electrical Engineering and Computer Science, Computer Science Division, at the University of Central Florida. He also holds appointments in the School of Visual Arts and Design and the Institute for Simulation and Training (IST), is a Fellow of the UCF Academy for Teaching, Learning and Leadership, and holds an IPA appointment with the US Department of Veterans Affairs. He is co-director of the UCF Synthetic Reality Laboratory (http://sreal.ucf.edu). His research is in augmented reality environments with a specialization in networked digital puppetry (the remote control by humans of surrogates in the form of virtual or physical-virtual avatars). He conducts research on the use of digital puppetry-based experiences in cross cultural and situational awareness training, teacher and trainer education, social and interpersonal skills development, and physical and cognitive assessment and rehabilitation. He is author or co-author of over 170 refereed publications. He is an Associate Editor of Entertainment Computing and the Journal of Cybertherapy and Rehabilitation, and a member of the Program Committee and co-chair of Research Exhibits for IEEE VR 2013. He has active funding to support his research from the National Endowments for the Humanities, the National Institutes of Health, the National Science Foundation, the Office of Naval Research, Veterans Affairs and the Bill & Melinda Gates Foundation. His funding (PI or co-PI) over the last decade exceeds $15M.


    Results

    Metadata cleanup and organization

    While a minimum set of metadata field requirements are a progressive step, in instance the isolation sources are currently entered as non-controlled free text, which required time-consuming verification and validation procedures before being integrated with genomic data for analyses. Moreover, public health agencies have different constraints about the level of metadata that can be made openly accessible. For example, the Centers for Disease Control and Prevention (CDC) provide only the years of clinical cases occurrence and does not communicate the geographical location of the cases. GenomeGraphR integrates NCBI metadata that has been cleaned and organized. We used a hierarchical classification/categorization of isolation sources built on the IFSAC scheme [13], chosen for its simplicity, acceptability, and use in the food safety attribution domain.

    A total of 139,754 isolates of S. enterica. were submitted to NCBI from 2010 to 2018 as of July 31 st , 2018. The isolation source of 812 (0.6% of all the strains) were not classified because of missing or unclear/unintelligible data. For L. monocytogenes only 59 isolates out of a total of 16,567 were not assignable to any of the defined isolate categories. The distribution of isolates by major isolate categories is presented in Table 1. The categorization scheme applied to L. monocytogenes and S. enterica strains consists of the eight-level hierarchy for categorization of foods developed by IFSAC [13], extended to include environmental and animal (non-food) sources and applied here to strain isolation sources NCBI. Fig 2 illustrates the hierarchy for the non-clinical strains and the volume of strains associated with each level using a Sankey plot.


    Watch the video: Build Phylogenetic Trees using RAxML and iTOL (January 2022).