Information

Why is it harder to sequence plant genomes than animal genomes?


Plants seem to be less complex organisms than animals, but despite that there are less plant genomes sequenced. Is that because plant genomes are more complex, for example in terms of regulatory regions and transposons that make sequencing more difficult, or are there other reasons why the number of sequenced plant genomes is smaller than animals?


The authors of this 2012 review article summarize the problem well in their introduction:

In contrast to the tremendous advances in throughput, assembling sequencing reads remains a substantial endeavor, much greater than the sequencing efforts alone would suggest [22-24]. Large complex plant genomes remain a particularly difficult challenge for de novo assembly for a variety of biological, computational and biomolecular reasons. Plant genomes can be nearly 100 times larger [25] than the currently sequenced bird [26], fish [27] or mammalian genomes [28]. In addition they can have much higher ploidy, which is estimated to occur in up to 80% of all plant species [29], and higher rates of heterozygosity and repeats [30] than their counterparts in other kingdoms. Furthermore, the gene content in plants can be very complex, as shown by the presence of large gene families and abundant pseudogenes with nearly identical sequences derived from recent whole genome duplication events and transposon activity [13]. Plants tend to have high copy chloroplasts and mitochondria organelles, which complicate assembly of their remnants in the nuclear genome and skew coverage levels [12]. Finally, it is often very difficult to extract large quantities of high-quality DNA from plant material, making it difficult to prepare proper libraries for sequencing.

From: Schatz, Witkowski and McCombie. "Current challenges in de novo plant genome sequencing and assembly". Genome Biology (2012). 13:243

As you can see, all of the key reasons have been touched on by the various people who have commented below your question.


3. Challenging Features of Plant Genomes

3.1. Sampling

3.2. Genome Size and Complexity

3.3. Transposable Elements

3.4. Heterozygosity

3.5. Polyploidy

3.6. Gene Content and Gene Families

270 kbp of the mitochondrial genome inserted into Chromosome 2 of Arabidopsis [62]. But gene duplication is regarded as a major force in the origin of new genes and genetic functions. By way of example, the appearance of C4 photosynthesis has evolved from the C3 pathway and has appeared independently on at least 50 occasions during plant evolution [63]. Other examples of gene duplication are the striking increase in the number of starch‑associated genes in papaya (39) with respect to Arabidopsis (20), or the expanded number of kinase family members, cytochromes P450 and the enzymes engaged in plant secondary metabolism [64]. However, recent comparisons of Arabidopsis , poplar, grapevine, papaya and rice genomes estimated that the angiosperm ancestor should contain between 12,000 and 14,000 genes [15]. As a result, more than half of plant genes are really a gene family, 45% of them with the same function but different expression patterns [65]. Specific strategies are required to distinguish alleles from paralogues when sequencing natural heterozygous isolates, although this is not expected to have a very promising success in the near future [59]. Moreover, the presence of out-paralogues produced by duplication prior to the divergence of two lineages and in-paralogues produced in each lineage, together with the multiple rounds of polyploidy in plant lineages, accentuate these problems as divergence between paralogues occurs at different paces.

3.7. Non-Coding RNAs

3.8. Widely Distributed Repetitive Sequences (Low Complexity Sequences)

250 tandem duplications each of

10 kbp on Chromosome 2 of Arabidopsis ) and between chromosomes (e.g.,

4 Mbp long regions between Chromosomes 2 and 4, or 700 Mbp long regions between Chromosomes 1 and 2 in Arabidopsis

3 Mbp at the termini of the short arms of Chromosomes 11 and 12 in rice, as well as Chromosomes 5 and 8 in sorghum) [62,74].


Science Says: Why scientists prize plant, animal genomes

This Tuesday, Feb. 12, 2019 photo shows male mosquitos at the the Vosshall Laboratory at Rockefeller University in New York. In 2018, researchers at the lab published a much-improved description of the DNA code for a particularly dangerous species of mosquito: Aedes aegypti, notorious for spreading Zika, dengue and yellow fever. Associated Press

In this Tuesday, Feb. 12, 2019 photo, researcher Ben Matthews speaks in a room housing mosquitos in the Vosshall Laboratory at Rockefeller University in New York. Knowing the DNA sequence lets scientists manipulate it with gene editing techniques, said Matthews, who was part of the international team that published the refined description of the mosquito genome in November 2018. Associated Press

In this Tuesday, Feb. 12, 2019 photo, PhD student Krithika Venkataraman mates mosquitos by blowing males into a container housing females at the the Vosshall Laboratory of Rockefeller University in New York. Researchers nearly doubled the known size of a family of genes that help mosquitoes sense information from their environment, such as the odor of humans. That was “totally, mind-blowingly, unexpected,” Leslie Vosshall says. Associated Press

In this Tuesday, Feb. 12, 2019 photo, research assistant Anjali Pandey sorts mosquito larvae with edited DNA in the the Vosshall Laboratory at Rockefeller University in New York. Their eyes glowed red or blue under her microscope to indicate they were carrying genetic modifications. Associated Press

This undated microscope image provided by researcher Ben Matthews of Rockefeller University in February 2019 shows mosquito larvae studied at the Vosshall Laboratory in New York. In November 2018, researchers at the lab published a much-improved description of the DNA code for a particularly dangerous species of mosquito: Aedes aegypti, notorious for spreading Zika, dengue and yellow fever. (Ben Matthews/Rockefeller University via AP) Associated Press

NEW YORK -- Just about every week, it seems, scientists publish the unique DNA code of some creature or plant. Just in February, they published the genome for the strawberry, the paper mulberry tree, the great white shark and the Antarctic blackfin icefish.

They also announced that, thanks to a crowdfunding campaign, they'd produced the genome of Lil BUB, a female cat with a large internet following.

That followed a notable advance in January: an improved genome for the axolotl, a salamander renowned for regrowing severed limbs and other body parts.

Scientists have been uncovering genomes for quite a while. The first from an animal - a worm - came in 1998. Now, the technology has advanced far enough that scientists last year announced a project to produce the genomes for all life forms on Earth other than bacteria and single-celled organisms called archaea. They called it a "moonshot for biology."

But what's the point of uncovering new genomes?

For scientists, a detailed look under the hood of their favorite organism provides a foothold for learning the deepest secrets of their objects of attention, it leads to discoveries about how life works, and possibly how to prevent disease.

Take the mosquito. Late last year, researchers published a much-improved description of the DNA code for a particularly dangerous species of mosquito: Aedes aegypti, notorious for spreading Zika, dengue and yellow fever.

That achievement came from analyzing the DNA of 80 mosquito brothers. They were born in Leslie Vosshall's lab at Rockefeller University in New York, where thousands of mosquitoes swarmed in cages recently as Krithika Venkataraman was trying to make some more.

She stuck a tube that protruded from her mouth like a straw into a transparent cube filled with male mosquitoes. Then she repeatedly sucked about 30 males at a time into the tube. She counted them, and then blew them into another cube that housed females. Before long, the two sexes were mating.

You can think of a genome as an instruction book for building a living thing. Its language is a four-letter alphabet, which stand for the four compounds that make up the innards of the DNA molecule. The order of those compounds along the molecule is the code it creates "words" that we call genes.

The mosquito genome, for example, is about 1.28 billion letters long, a bit less than half the length of the human version. Knowing the DNA sequence lets scientists manipulate it with gene editing techniques, said Ben Matthews of the Vosshall lab, who was part of the international team that published the refined description of the mosquito genome last November.

And once researchers started analyzing that version of the DNA code, discoveries began to pop out.

- They nearly doubled the known size of a family of genes that help mosquitoes sense information from their environment, such as the odor of humans. That was "totally, mind-blowingly, unexpected," Vosshall said. (Vosshall's salary is paid by the Howard Hughes Medical Institute, which also supports The Associated Press Health & Science Department.)

Further study may reveal surprises about what mosquitoes pay attention to, Vosshall said. And that could lead to better lures for mosquito traps, as well as better repellents. Maybe scientists can find something "10,000 times more disgusting" to a mosquito than the old standby, DEET, she said.

- They found new details about genes that let some mosquitoes resist certain insecticides. That's a possible step toward predicting what insecticides would be useless for fighting certain populations, as well as a potential lead for coming up with new chemical weapons against the insect.

- They found previously unknown targets for a major class of insecticides. That could open the door to designing new versions that target mosquitoes while sparing beneficial insects and posing less risk to people.

- They narrowed the search for genetic variants that prevent some Aedes aegypti mosquitoes from infecting people with dengue, a severe flu-like illness that sickens millions every year. If those variants can be identified, scientists might use genetic engineering to reproduce them in some mosquitoes, which could then be released to spread the variants though wild populations, Vosshall said. Those variants, or others, might also work for reducing threats of spreading Zika and yellow fever, Vosshall and Matthews said.

- A similar strategy might be used to make mosquito populations overproduce males. That would reduce mosquito bites in the short term - only females bite - and open the door to shrinking wild populations through genetic engineering. The new genome revealed details of the DNA stretch that makes mosquitoes develop as males, which Matthews called "step one" in pursuing the make-more-males strategy.

The salamander genome published in January built on a previous publication by European scientists last year. Although its genome is about 10 times the size of the human one, which makes the analysis harder, the axolotl's regenerating capabilities are an obvious lure.

Axolotls can replace "almost anything you can cut off of them, as long as you don't cut off their heads," says Jeramiah Smith of the University of Kentucky in Lexington, an author of the more recent genome paper.

But Smith points to another trick that might pay off sooner for human medicine: The salamander can also heal large wounds without scarring.

As for learning how to let people grow back a severed arm, he figures that's a long way off.

"That probably won't be useful for me," joked Smith, who's 42. "I'll be dead, so I won't need to grow my arm back."

And Lil BUB ? She's the size of a kitten even though she's 8 years old, and has a number of other odd traits. Scientists looked for genetic mutations, and found altered genes that appear to be responsible for her extra toes and for a rare bone disease.

Follow Malcolm Ritter at @MalcolmRitter.

The Associated Press Health & Science Department receives support from the Howard Hughes Medical Institute's Department of Science Education. The AP is solely responsible for all content.


Functional genomics

Satoshi Tabata (Kazusa DNA Research Center, Japan) spoke about their analyses of Lotus japonicus. They used a SAGE approach to study the root to nodule transition, identifying stage specific sequences, several of which have been verified by RT-PCR. The sequences are compared to their EST resources to identify genes, where no match is obtained, they use 3′ RACE to obtain longer fragments. They now plan to apply this approach to other tissues. They have generated an EST library from seed points and are sequencing 3′UTRs of their clones. In addition they are using genes from other legumes to PCR screen genomic libraries, clones identified are sequenced for gene prediction. They have TAC and BAC libraries of the genome and are end sequencing 640 clones. Based on the 914 genes identified so far, the average L. japonicus gene is 2.7 kb long, 76% of genes have introns, the average intron length being 380 bp and the average exon being 276 bp long. Other work includes searching clones for SSRs and using PCR testing for polymorphism in the parents of the mapping population. They are also looking for SNPs. For more information about the project, visit: http://www.kazusa.or.jp/lotus/.

Khalid Meksem (Southern Illinois University, USA) spoke about his work in integrating physical maps and genetic markers in a range of plants. His group are developing BAC libraries and physical maps for plants including soybean, Arabidopsis, Lotus and moss and for fungi including Ustilago and Fusarium (http://hbz.tamu.edu). For this they are using a proprietary PAGE fingerprinting method and enzyme kit. The soybean physical map has 95 322 fingerprinted BACs covering 11.8X soybean haploid genomes. This map has been integrated with 256 genetic markers on 20 linkage groups and 313 microsatellites will soon be added to this. They are using multiplex hybridisation methods to integrate ESTs in batches of 512 to confirm the map. The map has been used to develop new genetic markers (microsatellites, InDels and SNPs) in regions of the genome that lack conventional genetic markers. They are also making a comparative map of a wild strain, they have found a homologue of a Glycine max gene of interest, but it was surrounded by different genes. In their two regions of interest, they have seen 1.5 genes/10kb and that only 50% of genes were covered by the EST resource, indicating a need for more ESTs.

Don Langmore (Rubicon Genomics) presented their OmniPlex technology and its applications in functional genomics. This direct genomic sequencing (requiring no cloning) can be used for sequencing transgene junctions, insertion mutagenesis junctions and gene regions (based on EST data) in a range of bacteria and eukaryotes.

Lynn Jablonski demonstrated the use of tools from Integrated Genomics Inc. for comparative analysis and functional reconstruction. She spoke mainly about ERGO, a database incorporating public and proprietary sequence data, which is designed for use in discovering interactions within the cell. The database expands on the PUMA and WIT databases for metabolic reconstruction.

Michael Udvardi (Max Planck Institute, Golm, Germany) presented work using transcriptome and proteome analysis for the understanding of organ differentiation. He detailed the current knowledge of plant cell differentiation during nodule formation in leguminous plants, highlighting the role of transcriptomic, proteomic and metabolomic analyses in his group's most recent discoveries (see Trevaskis et al., this issue).

Bradley Till (FHCRC, USA) presented TILLING (Targetting Induced Local Lesions IN Genomes). This method uses EMS point mutagenesis, followed by self-crossing and then production of a parallel seedbank and DNA resource for PCR for mutation detection. The mutation detection step relies on CEL1, a novel plant endonuclease that preferentially cleaves mismatches in heteroduplexes between wild type and mutant. This is used to digest the products of gene specific PCR amplification of DNA from mutants, using differently labelled primers. The products of the digest are two fragments (whose lengths should add up to the full size of the expected PCR product), fluorescing at different wavelengths. This allows detection of any mutants on a gel, on which eight samples can be pooled per lane. In a trial of the system, they also detected a natural variation in one of the supplied ecotypes and have gone on to show that it can be used to type accessions for SNPs. They have developed PARSESNP, to store information on the mutations, such as which nucleic acid change is involved, and the effect. Backed by NSF funding, the group have embarked on an Arabidopsis TILLING program (http://blocks.fhcrc.org/˜steveh/Welcome_to_ATP.html) in which they can screen ∼12,000 samples/day/1kb region. They have also developed a tool, called CODDLE, which will suggest the best region for TILLING within a gene of interest. TILLING is now being applied to a wide range of organisms.

D.B. Goodenowe spoke about the metabolic profiling approach of Phenomenome Discoveries Inc. They take a non-targetted approach, detecting just those metabolites in the samples, rather than trying to detect all metabolites. They directly infuse samples into an ion cyclotron mass spectrometer, relying on its ability to give very high mass accuracy to allow them to assume that they can extract correct empirical formulas from the data.

Peter Gresshoff (University of Queensland, Australia) spoke about his nodule related gene discovery and expression profiling work. His group are interested in root development and the establishment of nodules. There are some genes known to be involved in this process, such as those in the NOD factor signal transduction pathway, however, not all of the genes are known and the regulatory mechanisms are also not completely known. They have taken an insertional promoter trapping approach, using a promoter-less GUS reporter gene. This gives information on gene expression patterns and regulation of genes. They have found an early gene, which has a role in vasculature, this gave mutants with only one nodule. They also identified a lateral root and nodule specific insertion and are currently working out which gene's promoter is causing the pattern. In another approach, his group have made a microarray, with 4000 unique ESTs and other genes represented. Using this they identified 10 differentially expressed genes which they verified using real-time RT-PCR and Northern analysis. In most cases the results matched well, plotting the correlation of the array with qRT-PCR gave a very good correlation score of r 2 ∼1 with linearity over 6 logs.


Advertisement

And once researchers started analyzing that version of the DNA code, discoveries began to pop out.

— They nearly doubled the known size of a family of genes that help mosquitoes sense information from their environment, such as the odour of humans. That was “totally, mind-blowingly, unexpected,” Vosshall said. (Vosshall’s salary is paid by the Howard Hughes Medical Institute, which also supports The Associated Press Health & Science Department.)

Further study may reveal surprises about what mosquitoes pay attention to, Vosshall said. And that could lead to better lures for mosquito traps, as well as better repellents. Maybe scientists can find something “10,000 times more disgusting” to a mosquito than the old standby, DEET, she said.

— They found new details about genes that let some mosquitoes resist certain insecticides. That’s a possible step toward predicting what insecticides would be useless for fighting certain populations, as well as a potential lead for coming up with new chemical weapons against the insect.

— They found previously unknown targets for a major class of insecticides. That could open the door to designing new versions that target mosquitoes while sparing beneficial insects and posing less risk to people.

— They narrowed the search for genetic variants that prevent some Aedes aegypti mosquitoes from infecting people with dengue, a severe flu-like illness that sickens millions every year. If those variants can be identified, scientists might use genetic engineering to reproduce them in some mosquitoes, which could then be released to spread the variants though wild populations, Vosshall said. Those variants, or others, might also work for reducing threats of spreading Zika and yellow fever, Vosshall and Matthews said.


Results

We have developed an information-retrieval based method to identify all long identical multispecies elements (LIMEs) shared by two or more genomes, given the element’s minimal length (Materials and Methods). The method is alignment-free, allowing us to detect both syntenic and nonsyntenic sequences. We used this method to identify and compare sequences of extreme conservations shared between a set animal genomes and a set of plant genomes (Fig. 1 and SI Appendix, Figs. S1–S5). Specifically, we first obtained a comprehensive set of LIMEs 100 bp or longer for six animal genomes: dog (Canis familiaris Cf), chicken (Gallus gallus Gg), human (Homo sapiens Hs), mouse (Mus musculus Mum), macaque (Macaca mulatta Mam), and rat (Rattus norvegicus Rn). We also obtained all LIMEs of 100 bp or longer among the six publicly available large (>100 Mbp) plant genomes: Arabidopsis (Arabidopsis thaliana At), soybean (Glycine max Gm), rice (Oryza sativa Os), cottonwood (Populus trichocarpa Pt), sorghum (Sorghum bicolor Sb), and grape (Vitis vinifera Vv).

Structural taxonomy of plant and animal LIMEs. The plant genome set consists of Arabidopsis, soybean, rice, cottonwood, sorghum, and grape. The animal genome set is dog, chicken, human, mouse, macaque, and rat. LIMEs (≥100 bp) are identified for every pair of plant genomes and every pair of animal genomes, and categorized. A pie chart shows the percentage of contribution of each LIME category connected with the pie chart. Because of the lack of the annotation for all species involved, the last classification level, Origin Class, includes the percentages for Arabidopsis LIMEs in plants (*) and the percentages for the animal LIMEs in human, mouse, rat, and chicken (**) the absolute numbers are given in SI Appendix, Table S5. We have defined telomeric repeats as syntenic for clarity. LINEs, long interspersed elements SINES, short interspersed elements.

The comparative analysis of flowering plant and animal LIMEs revealed key similarities and differences between the two groups (Fig. 1). Both groups include repetitive LIMEs, consisting of multiple copies of one or two repeated motifs, as well as nonrepetitive, or complex, LIMEs. Furthermore, each group has LIMEs that occur in multiple copies in a genome and are often spread across multiple chromosomes. Finally, animal and plant LIMEs are likely to owe their origins to several mechanisms, including purifying selection, transferring genetic material from organellar to nuclear genomes, and de novo sequence manufacturing some of these mechanisms may be unique to plants.

LIMEs in Animal Genomes.

We first compared the complex LIMEs shared between the human, mouse, and rat genomes (2004 builds) found by our algorithm with the UCEs obtained by Bejerano et al. (3). We found that in addition to identifying all 481 previously reported UCEs, our method identified 12 previously undescribed elements of 200 bp or longer (more details are provided in Materials and Methods and SI Appendix, Table S1). Unexpectedly, 4 of those 12 elements were nonsyntenic (SI Appendix, Table S2), including two LIMEs originating from retrotransposition events (SI Appendix, sections S1 and S2). Overall, there were 1,572,580 unique complex elements of at least 100 bp in the animal set of six genomes: 19% (297,329) had multiple copies in a single genome, and 10% (157,723) had multiple copies in multiple genomes, including 95 having multiple copies in at least four genomes. These 95 were merged into just 12 “supersequences” based on overlaps in their genomic locations. A BLAST search of these elements against the nonredundant (NR) nucleotide database at the National Center for Biotechnology Information (NCBI) (24) revealed exact matches to snRNAs, such as human 7SL, RNU1-6, RNU1-9, and RNU6-1, as well as heterogeneous nuclear ribonucleoprotein A1 from horse (more details are provided in SI Appendix, section S1). SI Appendix, Fig. S6 shows the distribution of multicopy complex and repetitive LIMEs. Most of the complex LIMEs were shared between human and macaque (SI Appendix, section S3), whereas mouse had most of the repetitive LIMEs. Complex elements were often near each other and sometimes overlapped. For instance, in human, 92% (7,384,943 of 7,960,078) of the complex elements overlapped as a result, they could be grouped into just 668 clusters (2 elements are assigned to the same cluster if they are within 60,000 bp). There were only 11 single-element clusters, whereas the largest cluster contained 295,876 elements.

There were 241 distinct motifs that made up the repetitive LIMEs in animals (SI Appendix, Table S3), and they ranged from 2 to 30 bp, with an average length of 8.2 bp (SD = 4.7 bp). There were 127 motifs that were shared by three species, 74 shared by four, 48 shared by five, and 28 shared by all six species. Although most repetitive elements overlapped, this was not universally the case. For instance, there were 8,331 nonoverlapping repetitive elements in animals that were dispersed across 90% (142 of 157) of the chromosomes, except for 15 chromosomes in chicken.

Of the complex LIMES shared by at least two animal genomes (Fig. 1), there were 1,120 (average length = 136.55 bp, SD = 41.60 bp) shared by all six genomes, with 76 LIMEs of length greater than 200 bp. Of those 76 LIMEs, 33 were nongenic in human, 43 were genic, and none shared more than 50% sequence identity with chicken when considering the surrounding genomic regions (±40,000 bp). In fact, 3 of the 76 LIMEs had only 2–3% sequence identity to chicken (an example is provided in SI Appendix, Fig. S7). This contrasts sharply with the results reported previously in animals, where UCEs were all from highly similar genomic regions. In fact, the term “ultraconserved,” arguably, does not apply in these cases.

LIMEs in Plant Genomes.

Using methods identical to those utilized for animal genomes, we determined the comprehensive set of LIMEs shared between six plant species (Fig. 1). Because extreme conservation between three or more plant species had never been addressed before, we focused on characterizing plant LIMEs in this work, determining their possible origins and comparing them with the animal LIMEs. Unlike animal genomes, repetitive LIMEs were prevalent in all six plant genomes (Fig. 2A). An average plant repetitive LIME was 143 bp long, which is shorter than an average complex LIME (175 bp Fig. 3A). The relative ratios of repetitive LIMEs to complex LIMEs were similar across the plant genomes considered (Fig. 3B) the Arabidopsis genome was typical in its possession and distribution of repetitive and complex LIMEs. We detected 214 unique complex LIMEs shared by Arabidopsis and at least one of the remaining five genomes (Fig. 2 B and C), including 91 unique complex elements shared between Arabidopsis and rice, 3.64-fold more than had previously been identified (23). In Arabidopsis, 35 of the 91 complex LIMES are nonoverlapping and (when considering multiple copies) 81 overlap with other complex elements (SI Appendix, section S4), whereas in rice, 69 of the 91 complex LIMES are nonoverlapping and 72 overlap with other complex elements. The repetitive elements constituted the majority of Arabidopsis LIMEs [1,685 distinct LIMEs (∼88.7%)], but the repertoire of repeated motifs was surprisingly small we found that a repetitive LIME contained copies of either one or two motifs from a total set of six motifs of 2–7 bp, with each occurring up to 323 times in tandem. The majority of Arabidopsis LIMEs were nongenic of 26,367 unique locations of repetitive LIMEs, 4,015 corresponded to genic sequences and 22,352 to nongenic sequences of the 305 locations of complex LIMEs, 169 were genic and 136 were nongenic. Using the Arabidopsis information resource annotation framework TAIR (25), we also categorized all genic LIMEs as exonic, partly exonic, or possibly intronic, based on their overlap with annotated gene models. We found 3,251 exonic, 713 partly exonic, and 220 possibly intronic locations of both repetitive and complex LIMEs.

Plant LIMEs are remarkably diverse in their structure and function. (A) Phylogenetic trees of the six animal and six plant species for complex and repetitive LIMEs. Mam corresponds to Macaca mulatta, and Mum corresponds to Mus musculus. A node number (bold) is the number of elements common to each species in a subtree below. All LIMEs ≥100 bp are considered for each subtree. At, Arabidopsis thaliana Gm, Glycine max (soybean) Hs, Homo sapiens (human) Mam, Macaca mulatta (macaque) Mum, Mus musculus (mouse) Os, Oryza sativa (rice) Pt, Populus trichocarpa (cottonwood) Rn, Rattus norvegicus (rat) Sb, Sorghum bicolor (sorghum) Vv, Vitis vinifera (grape). (B) LIMEs in the Arabidopsis (At) genome, depicted as colored ticks with complex LIMEs above and repetitive LIMEs below each chromosome (chr) sequence. Tick color corresponds to the number of genomes, including the At genome, sharing a LIME: red for three genomes, orange for four, light blue for five, and dark blue for six. When two LIMEs are 45 kbp or less apart, they are grouped in the same box. Once there are more than 20 LIMEs in such a box, the box size is unchanged but correct proportions of LIMEs shared by three, four, five, and six genomes are depicted by the relative thickness of the colored parts. Orange numbers specify the total number of LIMEs per box, and blue corresponds to the motif ID for one or multiple repetitive LIMEs. Identified centromere positions are shown as gray boxes. (C) Detailed representation of a chromosome 3 region that includes 2 LIMEs shared by all six genomes, and the nearest genes.

Each identified plant LIME could be classified into one of two basic structural classes: repetitive and complex LIMEs. (A) Distribution of LIME lengths in four groups of elements: single-copy complex, single-copy repetitive, multiple-copy complex, and multiple-copy repetitive. (B) Distribution of repetitive and complex LIMEs across six genomes (as percentage of total). At, Arabidopsis thaliana Gm, Glycine max (soybean) Os, Oryza sativa (rice) Pt, Populus trichocarpa (cottonwood) Sb, Sorghum bicolor (sorghum) Vv, Vitis vinifera (grape). (C) Basic types of sequence motifs used by repetitive LIMEs. In total, there are 12 unique motifs 2–7 bp long.

Taxonomy of Plant LIMEs Based on Their Possible Origins.

Syntenic analysis using the Comparative Genomic platform CoGe (26) revealed that complex plant LIMEs are nonsyntenic (Fig. 1). This finding unexpectedly contrasts with the syntenic nature of the mammalian UCEs (3). The lack of synteny further supports our contention that some plant LIMEs are not inherited vertically. Indeed, we suggest there are three possible origins for the identical sequences found in our set of plant genomes: vertical inheritance, horizontal transfer, and de novo manufacturing. Although vertical inheritance of nuclear material is straightforward, detecting it can be confounded by extensive genome rearrangements. For instance, to determine whether the four overlapping LIMEs from Table 1 are conserved in species other than the six plants considered above, we used the shortest one (107 bp) in a BLAST search against the NR nucleotide database at the NCBI (24) and found exact copies of this LIME in the mature coding sequence of 18S (cytoplasmic), 26S (organellar), and 28S (cytoplasmic) rRNA genes of 76 eukaryotic organisms, including plants, animals, and fungi (more details are provided in SI Appendix, section S5).

Four LIMEs common to all six species and papaya

Horizontally Inherited LIMEs.

The sequences of proposed horizontal inheritance detected by our algorithm could be of natural origin or artifactual. Some of the identified elements are likely the products of sequence assembly errors and/or bacterial sequence insertions (bacterial sequences were exclusively from Escherichia coli). On the other hand, we found several Arabidopsis repetitive elements associated with a transposon. A copy of a repetitive element containing the motif “GAGA” was found within an Arabidopsis gene annotated as “hAT-like transposase family” (TAIR gene ID AT5G28673) two other copies of this element were identified in genes annotated as “probable serine/threonine-protein kinase” (TAIR gene ID AT3G59410) and “unknown protein” (TAIR gene ID AT1G01725). Another copy of the same repetitive element, located on chromosome 2 of Arabidopsis, is classified as nongenic. SI Appendix, Fig. S8 shows the mapping of mitochondrial to nuclear genomes in Arabidopsis, rice, and sorghum. Arabidopsis has nine exonic LIMEs (SI Appendix, Table S4) that were derived from mitochondrial insertions. The cross-species genomic-to-genomic and mitochondrial-to-mitochondrial comparisons of these LIMEs revealed that the surrounding mitochondrial and nuclear sequences had rearranged and/or diverged, although still retaining these few elements throughout evolution (more details are provided in SI Appendix, section S6).

De Novo Sequence Manufacturing.

A process we refer to as “de novo sequence manufacturing” could be another possible source of identical cross-species sequences in plants. For example, telomeric repeats are manufactured by a known enzymatic mechanism (27), and these repeats certainly populate our collection of LIMEs. Strand slippage during DNA synthesis is another likely explanation for some of the repetitive elements identified. Likewise, gene conversion may underlie the LIMEs found among the rDNA genes. Similar to the previous description of Arabidopsis, although there were 25,066 unique repetitive LIMEs among the six genomes, these LIMEs were remarkably limited in the repeats they used. Thus, a repetitive LIME consisted of 1 or 2 short motifs the set of all motifs used in LIMEs encompassed only 12 of the 1,699 possible 2- to 7-bp motifs (Fig. 3C). Moreover, only sorghum contained repetitive LIMEs of all 12 motifs, whereas other genomes used subsets of 5–11 motifs (Tables 2 and 3). On average, a repertoire of ∼7.8 unique motifs was used by repetitive LIMEs from one genome. Many repeats appeared to be microsatellites, consisting of motifs 2–6 bp long (28). The exceptions were the TTTAGGG (LIME label 1 in Fig. 2B) and GAGA, which are telomeric (29) and GAGA-binding (30) protein, respectively, and possibly two other motifs, ATACAT and ATTAT (Fig. 3C and SI Appendix, section S7).

Repeat motifs of repetitive LIMEs in plant genomes

Distinct repeat motifs with a length of 2–7 bp shared between pairs of genomes that are found to contribute to the repetitive LIMEs

Colocalization of LIMEs: Clusters and Superclusters in Plants.

Whether to consider elements individually or in groups depends on the question being asked. For instance, when studying sequence function, it is often beneficial to view elements individually, whereas when studying evolution, as we do now, it is beneficial to group nearby elements into a cluster that serves as a coselected functional unit. The animal UCEs, including the nonexonic elements, are often clustered in the genomes near transcription factors and genes associated with development (3) however, little is known about the colocalization of plant LIMEs. Although this property is expected for repetitive plant LIMEs, where one tandem repeat sequence could be a source of many repetitive LIMEs, we also found more overlapping than nonoverlapping complex LIMEs in four of the six plant genomes, with the exceptions being rice and sorghum (Fig. 4A and SI Appendix, section S8). The soybean genome, for example, contained 5,451 copies of 336 unique complex elements that could be grouped into just 47 clusters, where adjacent/overlapping elements were ≤60,000 bp apart. In Arabidopsis, the cluster of such neighboring LIMEs containing the 4 LIMEs shared by all six genomes was located in close proximity to the centromere of chromosome 3. On the other hand, the cluster in rice (chromosome 2) containing the same LIMEs was not located near the centromere or the telomere (SI Appendix, Fig. S1). Colocalization of LIMEs had its extremes: Soybean chromosome 13 (SI Appendix, Fig. S2B) contained the largest group of 3,062 neighboring LIMEs (the average distance between the starting nucleotides of 2 neighboring LIMEs for the first 3,061 LIMEs was only 291 bp). This number was surprisingly high, surpassing the number of neighboring LIMEs in the remaining five genomes by at least an order of magnitude the rest of the soybean genome had 43 clusters with an average of 3.325 elements per cluster. Determining the origins of these abundant complex LIMEs in the region of the chromosome that is known for its unique association with the nucleolus organizer region (NOR) (31) could provide insights into differences between the soybean NOR and NORs of other species. For all six species, there were 631 complex clusters in total, with an average of ∼14 LIMEs per cluster (96.6%) and 306 complex LIMEs occurring alone (Fig. 4B). Also, there were 3,601 repetitive clusters (99.99%), with ∼1,007 LIMEs per cluster on average and 193 repetitive LIMEs occurring alone. A possible explanation for this clustering of LIMEs is horizontal gene/genome transfer events from organelle genomes.

Plant LIMEs are often found overlapping or in close proximity to each other. (A) Numbers of complex LIMEs that (i) overlap with at least one complex LIME and (ii) do not overlap. Shown in the last column is the total number of complex LIME clusters, where each element in the cluster either overlaps with another element or is located within 60 kbp of another complex LIME. At, Arabidopsis thaliana Gm, Glycine max (soybean) Os, Oryza sativa (rice) Pt, Populus trichocarpa (cottonwood) Sb, Sorghum bicolor (sorghum) Vv, Vitis vinifera (grape). (B) Distribution of cluster sizes among clusters containing repetitive and complex LIMEs.

We next studied the relationship between the propensity of LIMEs to localize within the same cluster and to occur in multiple copies within the same genome and across multiple genomes. When constructing a network of clustered complex LIMEs, where two clusters were connected if they shared at least one common LIME, we found that the clusters were naturally grouped into 170 “superclusters,” where no 2 superclusters shared a single LIME (Fig. 5 and SI Appendix, section S9 and Fig. S9). When analyzing connectivity within superclusters, we found that LIMEs that belonged to the same cluster in one species were dispersed into multiple clusters in another species. For instance, in a supercluster that included a single complex LIME from Arabidopsis (LIME ID 1516), the average number of interspecies connections for one cluster was ∼3.4 (red edges in Fig. 5). Similarly, the intraspecies copies of a multicopy LIME often did not colocalize in the same cluster (dark green edges in Fig. 5 and SI Appendix, Fig. S10).

“Supercluster” of complex LIMEs that includes a single element from Arabidopsis (LIME ID 1516) and 24 clusters from four other genomes: soybean, rice, sorghum, and grape. The network of complex LIMEs from Arabidopsis (At maroon node), soybean (Gm gray nodes), rice (Os gold nodes), sorghum (Sb green nodes), and grape (Vv blue nodes) is shown. All elements in one cluster are connected to a selected representative with the edges of the same color as the nodes. Clusters of LIMEs within one species are connected through the representative nodes with dark green edges if they share one or more multiple-copy complex LIMEs. Clusters sharing LIMEs across multiple species are connected through their representatives with red edges.

LIMEs in Plants vs. Animals.

Individual elements are defined as the longest common subsequence between two larger sequences. Our algorithm finds all such matching subsequences (≥100 bp) between genomes. The simplest way to quantify the elements is to count them individually. However, this leads to “double counting,” because many overlap (Materials and Methods). The structural taxonomy shown in Fig. 1 can be used to quantify them differently. It breaks down cross-species elements into two initial categories: repeated motifs and complex sequences. SI Appendix, Table S3 lists the 241 repeated motifs in the animal set and the 12 motifs in the plant set. To determine whether any of the repeated sequences were contained within mobile elements, we used the Repeat Masker server (32, 33), scanning the entire set of repetitive LIMEs. Among our LIMEs, we found homology only to several long interspersed elements (LINEs) and LTRs in mammals (1 LINE and 2 LTRs in human, 2 LINEs and 8 LTRs in rat as well as in mouse, and 1 LINE in dog) no homologous repeats for the chicken or plant LIMEs were found. Interestingly, nine repetitive LIMEs are shared between plants and animals. However, the LIME distribution is quite different between the two groups: Only a small minority of plant LIMEs have complex sequences [1,110 (4%)]. On the other hand, most of the elements in the animal set have complex sequences [1,572,580 (85%)]. If we count not the existence of an element but the total number of copies of it in each genome, these figures change to 0.24% and 60% for plants and animals, respectively. The number of copies of repetitive and complex elements also differs: 16,029 (64%) of repeated motif elements in plants and 151,091 (54%) in animals have multiple copies in at least one genome. For complex elements, the numbers are 435 (39%) and 455,052 (29%), respectively. In the plant set, there were 1,110 unique complex sequences of LIMEs shared by two genomes, 234 shared by three genomes, 144 shared by four genomes, 54 shared by five genomes, and 4 shared by all six genomes (Fig. 1 and SI Appendix, Figs. S1–S5). Exact copies of the shortest of the last four LIMEs were also found in 76 different organisms, including species from plants, animals, and fungi.


Science Says: Why scientists prize plant, animal genomes

NEW YORK (AP) " Just about every week, it seems, scientists publish the unique DNA code of some creature or plant. Just in February, they published the genome for the strawberry, the paper mulberry tree, the great white shark and the Antarctic blackfin icefish.

They also announced that, thanks to a crowdfunding campaign, they'd produced the genome of Lil BUB, a female cat with a large internet following.

That followed a notable advance in January: an improved genome for the axolotl, a salamander renowned for regrowing severed limbs and other body parts.

Scientists have been uncovering genomes for quite a while. The first from an animal " a worm " came in 1998. Now, the technology has advanced far enough that scientists last year announced a project to produce the genomes for all life forms on Earth other than bacteria and single-celled organisms called archaea. They called it a "moonshot for biology."

But what's the point of uncovering new genomes?

For scientists, a detailed look under the hood of their favorite organism provides a foothold for learning the deepest secrets of their objects of attention, it leads to discoveries about how life works, and possibly how to prevent disease.

Take the mosquito. Late last year, researchers published a much-improved description of the DNA code for a particularly dangerous species of mosquito: Aedes aegypti, notorious for spreading Zika, dengue and yellow fever.

That achievement came from analyzing the DNA of 80 mosquito brothers. They were born in Leslie Vosshall's lab at Rockefeller University in New York, where thousands of mosquitoes swarmed in cages recently as Krithika Venkataraman was trying to make some more.

She stuck a tube that protruded from her mouth like a straw into a transparent cube filled with male mosquitoes. Then she repeatedly sucked about 30 males at a time into the tube. She counted them, and then blew them into another cube that housed females. Before long, the two sexes were mating.

You can think of a genome as an instruction book for building a living thing. Its language is a four-letter alphabet, which stand for the four compounds that make up the innards of the DNA molecule. The order of those compounds along the molecule is the code it creates "words" that we call genes.

The mosquito genome, for example, is about 1.28 billion letters long, a bit less than half the length of the human version. Knowing the DNA sequence lets scientists manipulate it with gene editing techniques, said Ben Matthews of the Vosshall lab, who was part of the international team that published the refined description of the mosquito genome last November.

And once researchers started analyzing that version of the DNA code, discoveries began to pop out.

" They nearly doubled the known size of a family of genes that help mosquitoes sense information from their environment, such as the odor of humans. That was "totally, mind-blowingly, unexpected," Vosshall said. (Vosshall's salary is paid by the Howard Hughes Medical Institute, which also supports The Associated Press Health & Science Department.)

Further study may reveal surprises about what mosquitoes pay attention to, Vosshall said. And that could lead to better lures for mosquito traps, as well as better repellents. Maybe scientists can find something "10,000 times more disgusting" to a mosquito than the old standby, DEET, she said.

" They found new details about genes that let some mosquitoes resist certain insecticides. That's a possible step toward predicting what insecticides would be useless for fighting certain populations, as well as a potential lead for coming up with new chemical weapons against the insect.

" They found previously unknown targets for a major class of insecticides. That could open the door to designing new versions that target mosquitoes while sparing beneficial insects and posing less risk to people.

" They narrowed the search for genetic variants that prevent some Aedes aegypti mosquitoes from infecting people with dengue, a severe flu-like illness that sickens millions every year. If those variants can be identified, scientists might use genetic engineering to reproduce them in some mosquitoes, which could then be released to spread the variants though wild populations, Vosshall said. Those variants, or others, might also work for reducing threats of spreading Zika and yellow fever, Vosshall and Matthews said.

" A similar strategy might be used to make mosquito populations overproduce males. That would reduce mosquito bites in the short term " only females bite " and open the door to shrinking wild populations through genetic engineering. The new genome revealed details of the DNA stretch that makes mosquitoes develop as males, which Matthews called "step one" in pursuing the make-more-males strategy.

The salamander genome published in January built on a previous publication by European scientists last year. Although its genome is about 10 times the size of the human one, which makes the analysis harder, the axolotl's regenerating capabilities are an obvious lure.

Axolotls can replace "almost anything you can cut off of them, as long as you don't cut off their heads," says Jeramiah Smith of the University of Kentucky in Lexington, an author of the more recent genome paper.

But Smith points to another trick that might pay off sooner for human medicine: The salamander can also heal large wounds without scarring.

As for learning how to let people grow back a severed arm, he figures that's a long way off.

"That probably won't be useful for me," joked Smith, who's 42. "I'll be dead, so I won't need to grow my arm back."

And Lil BUB ? She's the size of a kitten even though she's 8 years old, and has a number of other odd traits. Scientists looked for genetic mutations, and found altered genes that appear to be responsible for her extra toes and for a rare bone disease.

Follow Malcolm Ritter at @MalcolmRitter.

The Associated Press Health & Science Department receives support from the Howard Hughes Medical Institute's Department of Science Education. The AP is solely responsible for all content.


Annelida

Only one complete annelid mtDNA sequence has been determined, that of the oligochaete Lumbricus terrestris ( 97) small portions have been published of two other annelids, Platynereis and Helobdella, and of the related taxa Galatheolinum (phylum Pogonophora) and Urechis (phylum Echiura) ( 31) ( Fig. 4B). Unlike most studied mtDNAs, all Lumbricus mitochondrial genes are encoded on the same strand. One speculates that there could be a ‘ratchet’ effect to such a set of rearrangements. That is, if rearrangements were to place all genes on one strand, it would be expected that transcription of the other strand would soon cease, since presumably selection would not maintain the necessary signaling elements and the futile transcription would be an energetic burden. This would then constitute an effective barrier to further inversions which would place a gene on the non-transcribed strand unless that inversion also carried with it the necessary sequence elements to resume its expression.

In several respects Lumbricus mtDNA is quite conventional: only ATG is used as an initiation codon, whereas most mtDNAs use a variety of alternatives ( 18) the tRNAs have uncommonly uniform potential secondary structures nucleotide composition is more balanced than for most mtDNAs and non-coding nucleotides are very few. One unusual feature, however, is that A8 and A6 are separated by ∼2700 nt. In nearly all animal mtDNAs A8 and A6 are adjacent, often overlapping in alternate reading frames. In mammals, A8 and A6 are translated from a bicistronic transcript, with translation initiating alternatively at the 5′ end of the mRNA for A8 or at an internal start codon for A6 ( 98). It is unknown whether this is also the mode of translation of these two genes in other organisms, although, if so, it could explain their nearly universal juxtaposition. Other than A8 being missing from the mtDNAs of nematodes (see below), all exceptions to this are members of phyla assigned to the group ‘Eutrochozoa’ ( 99). A8 is missing from the mtDNA of Mytilus (Mollusca) ( 12) and these two genes are separated in the mtDNAs of Lumbricus (Annelida), Helobdella and Platynereis (Annelida unpublished) three pulmonate snails (Mollusca) ( 90), Dentalium and Nautilus (Mollusca unpublished) Urechis (Echiura unpublished), Galatheolinum (Pogonophora unpublished) Phascolopsis (Sipuncula unpublished) and Terebratalia (Brachiopoda unpublished). It may be that loss of co-translation of this bicistron is a derived feature of the Eutrochozoa this could be studied in members that retain A8 adjacent to A6, such as the polyplacophoran mollusk Katharina ( 95).


Science Says: Why scientists prize plant, animal genomes

This Tuesday, Feb. 12, 2019 photo shows male mosquitos at the the Vosshall Laboratory at Rockefeller University in New York. In 2018, researchers at the lab published a much-improved description of the DNA code for a particularly dangerous species of mosquito: Aedes aegypti, notorious for spreading Zika, dengue and yellow fever. (AP Photo/Mary Altaffer)

NEW YORK (AP) — Just about every week, it seems, scientists publish the unique DNA code of some creature or plant. Just in February, they published the genome for the strawberry, the paper mulberry tree, the great white shark and the Antarctic blackfin icefish.

They also announced that, thanks to a crowdfunding campaign, they'd produced the genome of Lil BUB, a female cat with a large internet following.

That followed a notable advance in January: an improved genome for the axolotl, a salamander renowned for regrowing severed limbs and other body parts.

Scientists have been uncovering genomes for quite a while. The first from an animal — a worm — came in 1998. Now, the technology has advanced far enough that scientists last year announced a project to produce the genomes for all life forms on Earth other than bacteria and single-celled organisms called archaea. They called it a "moonshot for biology."

But what's the point of uncovering new genomes?

For scientists, a detailed look under the hood of their favorite organism provides a foothold for learning the deepest secrets of their objects of attention, it leads to discoveries about how life works, and possibly how to prevent disease.

Take the mosquito. Late last year, researchers published a much-improved description of the DNA code for a particularly dangerous species of mosquito: Aedes aegypti, notorious for spreading Zika, dengue and yellow fever.

That achievement came from analyzing the DNA of 80 mosquito brothers. They were born in Leslie Vosshall's lab at Rockefeller University in New York, where thousands of mosquitoes swarmed in cages recently as Krithika Venkataraman was trying to make some more.

She stuck a tube that protruded from her mouth like a straw into a transparent cube filled with male mosquitoes. Then she repeatedly sucked about 30 males at a time into the tube. She counted them, and then blew them into another cube that housed females. Before long, the two sexes were mating.

You can think of a genome as an instruction book for building a living thing. Its language is a four-letter alphabet, which stand for the four compounds that make up the innards of the DNA molecule. The order of those compounds along the molecule is the code it creates "words" that we call genes.

The mosquito genome, for example, is about 1.28 billion letters long, a bit less than half the length of the human version. Knowing the DNA sequence lets scientists manipulate it with gene editing techniques, said Ben Matthews of the Vosshall lab, who was part of the international team that published the refined description of the mosquito genome last November.

And once researchers started analyzing that version of the DNA code, discoveries began to pop out.

— They nearly doubled the known size of a family of genes that help mosquitoes sense information from their environment, such as the odor of humans. That was "totally, mind-blowingly, unexpected," Vosshall said. (Vosshall's salary is paid by the Howard Hughes Medical Institute, which also supports The Associated Press Health & Science Department.)

Further study may reveal surprises about what mosquitoes pay attention to, Vosshall said. And that could lead to better lures for mosquito traps, as well as better repellents. Maybe scientists can find something "10,000 times more disgusting" to a mosquito than the old standby, DEET, she said.

— They found new details about genes that let some mosquitoes resist certain insecticides. That's a possible step toward predicting what insecticides would be useless for fighting certain populations, as well as a potential lead for coming up with new chemical weapons against the insect.

— They found previously unknown targets for a major class of insecticides. That could open the door to designing new versions that target mosquitoes while sparing beneficial insects and posing less risk to people.

— They narrowed the search for genetic variants that prevent some Aedes aegypti mosquitoes from infecting people with dengue, a severe flu-like illness that sickens millions every year. If those variants can be identified, scientists might use genetic engineering to reproduce them in some mosquitoes, which could then be released to spread the variants though wild populations, Vosshall said. Those variants, or others, might also work for reducing threats of spreading Zika and yellow fever, Vosshall and Matthews said.

— A similar strategy might be used to make mosquito populations overproduce males. That would reduce mosquito bites in the short term — only females bite — and open the door to shrinking wild populations through genetic engineering. The new genome revealed details of the DNA stretch that makes mosquitoes develop as males, which Matthews called "step one" in pursuing the make-more-males strategy.

The salamander genome published in January built on a previous publication by European scientists last year. Although its genome is about 10 times the size of the human one, which makes the analysis harder, the axolotl's regenerating capabilities are an obvious lure.

Axolotls can replace "almost anything you can cut off of them, as long as you don't cut off their heads," says Jeramiah Smith of the University of Kentucky in Lexington, an author of the more recent genome paper.

But Smith points to another trick that might pay off sooner for human medicine: The salamander can also heal large wounds without scarring.

As for learning how to let people grow back a severed arm, he figures that's a long way off.

"That probably won't be useful for me," joked Smith, who's 42. "I'll be dead, so I won't need to grow my arm back."

And Lil BUB ? She's the size of a kitten even though she's 8 years old, and has a number of other odd traits. Scientists looked for genetic mutations, and found altered genes that appear to be responsible for her extra toes and for a rare bone disease.

Follow Malcolm Ritter at @MalcolmRitter.

The Associated Press Health & Science Department receives support from the Howard Hughes Medical Institute's Department of Science Education. The AP is solely responsible for all content.


Science Says: Why scientists prize plant, animal genomes

In this Tuesday, Feb. 12, 2019 photo, PhD student Krithika Venkataraman mates mosquitos by blowing males into a container housing females at the the Vosshall Laboratory of Rockefeller University in New York. Researchers nearly doubled the known size of a family of genes that help mosquitoes sense information from their environment, such as the odor of humans. That was "totally, mind-blowingly, unexpected," Leslie Vosshall says. (AP Photo/Mary Altaffer)

NEW YORK -- Just about every week, it seems, scientists publish the unique DNA code of some creature or plant. Just in February, they published the genome for the strawberry, the paper mulberry tree, the great white shark and the Antarctic blackfin icefish.

They also announced that, thanks to a crowdfunding campaign, they'd produced the genome of Lil BUB, a female cat with a large internet following.

That followed a notable advance in January: an improved genome for the axolotl, a salamander renowned for regrowing severed limbs and other body parts.

Scientists have been uncovering genomes for quite a while. The first from an animal -- a worm -- came in 1998. Now, the technology has advanced far enough that scientists last year announced a project to produce the genomes for all life forms on Earth other than bacteria and single-celled organisms called archaea. They called it a "moonshot for biology."

But what's the point of uncovering new genomes?

For scientists, a detailed look under the hood of their favorite organism provides a foothold for learning the deepest secrets of their objects of attention, it leads to discoveries about how life works, and possibly how to prevent disease.

Take the mosquito. Late last year, researchers published a much-improved description of the DNA code for a particularly dangerous species of mosquito: Aedes aegypti, notorious for spreading Zika, dengue and yellow fever.

That achievement came from analyzing the DNA of 80 mosquito brothers. They were born in Leslie Vosshall's lab at Rockefeller University in New York, where thousands of mosquitoes swarmed in cages recently as Krithika Venkataraman was trying to make some more.

She stuck a tube that protruded from her mouth like a straw into a transparent cube filled with male mosquitoes. Then she repeatedly sucked about 30 males at a time into the tube. She counted them, and then blew them into another cube that housed females. Before long, the two sexes were mating.

You can think of a genome as an instruction book for building a living thing. Its language is a four-letter alphabet, which stand for the four compounds that make up the innards of the DNA molecule. The order of those compounds along the molecule is the code it creates "words" that we call genes.

The mosquito genome, for example, is about 1.28 billion letters long, a bit less than half the length of the human version. Knowing the DNA sequence lets scientists manipulate it with gene editing techniques, said Ben Matthews of the Vosshall lab, who was part of the international team that published the refined description of the mosquito genome last November.

And once researchers started analyzing that version of the DNA code, discoveries began to pop out.

-- They nearly doubled the known size of a family of genes that help mosquitoes sense information from their environment, such as the odor of humans. That was "totally, mind-blowingly, unexpected," Vosshall said. (Vosshall's salary is paid by the Howard Hughes Medical Institute, which also supports The Associated Press Health & Science Department.)

Further study may reveal surprises about what mosquitoes pay attention to, Vosshall said. And that could lead to better lures for mosquito traps, as well as better repellents. Maybe scientists can find something "10,000 times more disgusting" to a mosquito than the old standby, DEET, she said.

-- They found new details about genes that let some mosquitoes resist certain insecticides. That's a possible step toward predicting what insecticides would be useless for fighting certain populations, as well as a potential lead for coming up with new chemical weapons against the insect.

-- They found previously unknown targets for a major class of insecticides. That could open the door to designing new versions that target mosquitoes while sparing beneficial insects and posing less risk to people.

-- They narrowed the search for genetic variants that prevent some Aedes aegypti mosquitoes from infecting people with dengue, a severe flu-like illness that sickens millions every year. If those variants can be identified, scientists might use genetic engineering to reproduce them in some mosquitoes, which could then be released to spread the variants though wild populations, Vosshall said. Those variants, or others, might also work for reducing threats of spreading Zika and yellow fever, Vosshall and Matthews said.

-- A similar strategy might be used to make mosquito populations overproduce males. That would reduce mosquito bites in the short term -- only females bite -- and open the door to shrinking wild populations through genetic engineering. The new genome revealed details of the DNA stretch that makes mosquitoes develop as males, which Matthews called "step one" in pursuing the make-more-males strategy.

The salamander genome published in January built on a previous publication by European scientists last year. Although its genome is about 10 times the size of the human one, which makes the analysis harder, the axolotl's regenerating capabilities are an obvious lure.

Axolotls can replace "almost anything you can cut off of them, as long as you don't cut off their heads," says Jeramiah Smith of the University of Kentucky in Lexington, an author of the more recent genome paper.

But Smith points to another trick that might pay off sooner for human medicine: The salamander can also heal large wounds without scarring.

As for learning how to let people grow back a severed arm, he figures that's a long way off.

"That probably won't be useful for me," joked Smith, who's 42. "I'll be dead, so I won't need to grow my arm back."

And Lil BUB? She's the size of a kitten even though she's 8 years old, and has a number of other odd traits. Scientists looked for genetic mutations, and found altered genes that appear to be responsible for her extra toes and for a rare bone disease.

Print Headline: Science Says: Why scientists prize plant, animal genomes


Watch the video: Βίκυ Καρατζόγλου - Στην Πιο Δύσκολη Ώρα - Official Lyric Video (January 2022).