Information

14.2: Eukaryotic Origins - Biology

14.2: Eukaryotic Origins - Biology


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The fossil record and genetic evidence suggest that prokaryotic cells were the first organisms on Earth. These cells originated approximately 3.5 billion years ago, which was about 1 billion years after Earth’s formation, and were the only life forms on the planet until eukaryotic cells emerged approximately 2.1 billion years ago. During the prokaryotic reign, photosynthetic prokaryotes evolved that were capable of applying the energy from sunlight to synthesize organic materials (like carbohydrates) from carbon dioxide and an electron source (such as hydrogen, hydrogen sulfide, or water).

Photosynthesis using water as an electron donor consumes carbon dioxide and releases molecular oxygen (O2) as a byproduct. The functioning of photosynthetic bacteria over millions of years progressively saturated Earth’s water with oxygen and then oxygenated the atmosphere, which previously contained much greater concentrations of carbon dioxide and much lower concentrations of oxygen. Older anaerobic prokaryotes of the era could not function in their new, aerobic environment. Some species perished, while others survived in the remaining anaerobic environments left on Earth. Still other early prokaryotes evolved mechanisms, such as aerobic respiration, to exploit the oxygenated atmosphere by using oxygen to store energy contained within organic molecules. Aerobic respiration is a more efficient way of obtaining energy from organic molecules, which contributed to the success of these species (as evidenced by the number and diversity of aerobic organisms living on Earth today). The evolution of aerobic prokaryotes was an important step toward the evolution of the first eukaryote, but several other distinguishing features had to evolve as well.

Endosymbiosis

The origin of eukaryotic cells was largely a mystery until a revolutionary hypothesis was comprehensively examined in the 1960s by Lynn Margulis. The endosymbiotic theory states that eukaryotes are a product of one prokaryotic cell engulfing another, one living within another, and evolving together over time until the separate cells were no longer recognizable as such. This once-revolutionary hypothesis had immediate persuasiveness and is now widely accepted, with work progressing on uncovering the steps involved in this evolutionary process as well as the key players. It has become clear that many nuclear eukaryotic genes and the molecular machinery responsible for replicating and expressing those genes appear closely related to the Archaea. On the other hand, the metabolic organelles and the genes responsible for many energy-harvesting processes had their origins in bacteria. Much remains to be clarified about how this relationship occurred; this continues to be an exciting field of discovery in biology. Several endosymbiotic events likely contributed to the origin of the eukaryotic cell.

Mitochondria

Eukaryotic cells may contain anywhere from one to several thousand mitochondria, depending on the cell’s level of energy consumption. Each mitochondrion measures 1 to 10 micrometers in length and exists in the cell as a moving, fusing, and dividing oblong spheroid (Figure 13.2.1). However, mitochondria cannot survive outside the cell. As the atmosphere was oxygenated by photosynthesis, and as successful aerobic prokaryotes evolved, evidence suggests that an ancestral cell engulfed and kept alive a free-living, aerobic prokaryote. This gave the host cell the ability to use oxygen to release energy stored in nutrients. Several lines of evidence support that mitochondria are derived from this endosymbiotic event. Mitochondria are shaped like a specific group of bacteria and are surrounded by two membranes, which would result when one membrane-bound organism was engulfed by another membrane-bound organism. The mitochondrial inner membrane involves substantial infoldings or cristae that resemble the textured outer surface of certain bacteria.

Mitochondria divide on their own by a process that resembles binary fission in prokaryotes. Mitochondria have their own circular DNA chromosome that carries genes similar to those expressed by bacteria. Mitochondria also have special ribosomes and transfer RNAs that resemble these components in prokaryotes. These features all support that mitochondria were once free-living prokaryotes.

Chloroplasts

Chloroplasts are one type of plastid, a group of related organelles in plant cells that are involved in the storage of starches, fats, proteins, and pigments. Chloroplasts contain the green pigment chlorophyll and play a role in photosynthesis. Genetic and morphological studies suggest that plastids evolved from the endosymbiosis of an ancestral cell that engulfed a photosynthetic cyanobacterium. Plastids are similar in size and shape to cyanobacteria and are enveloped by two or more membranes, corresponding to the inner and outer membranes of cyanobacteria. Like mitochondria, plastids also contain circular genomes and divide by a process reminiscent of prokaryotic cell division. The chloroplasts of red and green algae exhibit DNA sequences that are closely related to photosynthetic cyanobacteria, suggesting that red and green algae are direct descendants of this endosymbiotic event.

Mitochondria likely evolved before plastids because all eukaryotes have either functional mitochondria or mitochondria-like organelles. In contrast, plastids are only found in a subset of eukaryotes, such as terrestrial plants and algae. One hypothesis of the evolutionary steps leading to the first eukaryote is summarized in Figure 13.2.2.

The exact steps leading to the first eukaryotic cell can only be hypothesized, and some controversy exists regarding which events actually took place and in what order. Spirochete bacteria have been hypothesized to have given rise to microtubules, and a flagellated prokaryote may have contributed the raw materials for eukaryotic flagella and cilia. Other scientists suggest that membrane proliferation and compartmentalization, not endosymbiotic events, led to the development of mitochondria and plastids. However, the vast majority of studies support the endosymbiotic hypothesis of eukaryotic evolution.

The early eukaryotes were unicellular like most protists are today, but as eukaryotes became more complex, the evolution of multicellularity allowed cells to remain small while still exhibiting specialized functions. The ancestors of today’s multicellular eukaryotes are thought to have evolved about 1.5 billion years ago.

Section Summary

The first eukaryotes evolved from ancestral prokaryotes by a process that involved membrane proliferation, the loss of a cell wall, the evolution of a cytoskeleton, and the acquisition and evolution of organelles. Nuclear eukaryotic genes appear to have had an origin in the Archaea, whereas the energy machinery of eukaryotic cells appears to be bacterial in origin. The mitochondria and plastids originated from endosymbiotic events when ancestral cells engulfed an aerobic bacterium (in the case of mitochondria) and a photosynthetic bacterium (in the case of chloroplasts). The evolution of mitochondria likely preceded the evolution of chloroplasts. There is evidence of secondary endosymbiotic events in which plastids appear to be the result of endosymbiosis after a previous endosymbiotic event.

Multiple Choice

What event is thought to have contributed to the evolution of eukaryotes?

A. global warming
B. glaciation
C. volcanic activity
D. oxygenation of the atmosphere

D

Mitochondria most likely evolved from _____________.

A. a photosynthetic cyanobacterium
B. cytoskeletal elements
C. aerobic bacteria
D. membrane proliferation

C

Free Response

Describe the hypothesized steps in the origin of eukaryote cells.

Eukaryote cells arose through endosymbiotic events that gave rise to energy-producing organelles within the eukaryotic cells, such as mitochondria and plastids. The nuclear genome of eukaryotes is related most closely to the Archaea, so it may have been an early archaean that engulfed a bacterial cell that evolved into a mitochondrion. Mitochondria appear to have originated from an alpha-proteobacterium, whereas chloroplasts originated from a cyanobacterium. There is also evidence of secondary endosymbiotic events. Other cell components may have resulted from endosymbiotic events.

Glossary

endosymbiosis
the engulfment of one cell by another such that the engulfed cell survives and both cells benefit; the process responsible for the evolution of mitochondria and chloroplasts in eukaryotes

Endosymbiotic theories for eukaryote origin

For over 100 years, endosymbiotic theories have figured in thoughts about the differences between prokaryotic and eukaryotic cells. More than 20 different versions of endosymbiotic theory have been presented in the literature to explain the origin of eukaryotes and their mitochondria. Very few of those models account for eukaryotic anaerobes. The role of energy and the energetic constraints that prokaryotic cell organization placed on evolutionary innovation in cell history has recently come to bear on endosymbiotic theory. Only cells that possessed mitochondria had the bioenergetic means to attain eukaryotic cell complexity, which is why there are no true intermediates in the prokaryote-to-eukaryote transition. Current versions of endosymbiotic theory have it that the host was an archaeon (an archaebacterium), not a eukaryote. Hence the evolutionary history and biology of archaea increasingly comes to bear on eukaryotic origins, more than ever before. Here, we have compiled a survey of endosymbiotic theories for the origin of eukaryotes and mitochondria, and for the origin of the eukaryotic nucleus, summarizing the essentials of each and contrasting some of their predictions to the observations. A new aspect of endosymbiosis in eukaryote evolution comes into focus from these considerations: the host for the origin of plastids was a facultative anaerobe.

Keywords: anaerobes endosymbiosis eukaryotes mitochondria nucleus plastids.

Figures

Models describing the origin of…

Models describing the origin of the nucleus in eukaryotes. ( ao…

Models describing the origin of…

Models describing the origin of mitochondria and/or chloroplasts in eukaryotes. ( a –…

Mitochondrial origin in a prokaryotic…

Mitochondrial origin in a prokaryotic host. ( ah ) Illustrations for…

Evolution of anaerobes and the…

Evolution of anaerobes and the plastid. ( ad ) Diversification of…


Eukaryotic Origins

The fossil record and genetic evidence suggest that prokaryotic cells were the first organisms on Earth. These cells originated approximately 3.5 billion years ago, which was about 1 billion years after Earth’s formation, and were the only life forms on the planet until eukaryotic cells emerged approximately 2.1 billion years ago. During the prokaryotic reign, photosynthetic prokaryotes evolved that were capable of applying the energy from sunlight to synthesize organic materials (like carbohydrates) from carbon dioxide and an electron source (such as hydrogen, hydrogen sulfide, or water).

Photosynthesis using water as an electron donor consumes carbon dioxide and releases molecular oxygen (O2) as a byproduct. The functioning of photosynthetic bacteria over millions of years progressively saturated Earth’s water with oxygen and then oxygenated the atmosphere, which previously contained much greater concentrations of carbon dioxide and much lower concentrations of oxygen. Older anaerobic prokaryotes of the era could not function in their new, aerobic environment. Some species perished, while others survived in the remaining anaerobic environments left on Earth. Still other early prokaryotes evolved mechanisms, such as aerobic respiration, to exploit the oxygenated atmosphere by using oxygen to store energy contained within organic molecules. Aerobic respiration is a more efficient way of obtaining energy from organic molecules, which contributed to the success of these species (as evidenced by the number and diversity of aerobic organisms living on Earth today). The evolution of aerobic prokaryotes was an important step toward the evolution of the first eukaryote, but several other distinguishing features had to evolve as well.

Endosymbiosis

The origin of eukaryotic cells was largely a mystery until a revolutionary hypothesis was comprehensively examined in the 1960s by Lynn Margulis. The endosymbiotic theory states that eukaryotes are a product of one prokaryotic cell engulfing another, one living within another, and evolving together over time until the separate cells were no longer recognizable as such. This once-revolutionary hypothesis had immediate persuasiveness and is now widely accepted, with work progressing on uncovering the steps involved in this evolutionary process as well as the key players. It has become clear that many nuclear eukaryotic genes and the molecular machinery responsible for replicating and expressing those genes appear closely related to the Archaea. On the other hand, the metabolic organelles and the genes responsible for many energy-harvesting processes had their origins in bacteria. Much remains to be clarified about how this relationship occurred this continues to be an exciting field of discovery in biology. Several endosymbiotic events likely contributed to the origin of the eukaryotic cell.

Mitochondria

Eukaryotic cells may contain anywhere from one to several thousand mitochondria, depending on the cell’s level of energy consumption. Each mitochondrion measures 1 to 10 micrometers in length and exists in the cell as a moving, fusing, and dividing oblong spheroid ([link]). However, mitochondria cannot survive outside the cell. As the atmosphere was oxygenated by photosynthesis, and as successful aerobic prokaryotes evolved, evidence suggests that an ancestral cell engulfed and kept alive a free-living, aerobic prokaryote. This gave the host cell the ability to use oxygen to release energy stored in nutrients. Several lines of evidence support that mitochondria are derived from this endosymbiotic event. Most mitochondria are shaped like a specific group of bacteria and are surrounded by two membranes. The mitochondrial inner membrane involves substantial infoldings or cristae that resemble the textured outer surface of certain bacteria.

Mitochondria divide on their own by a process that resembles binary fission in prokaryotes. Mitochondria have their own circular DNA chromosome that carries genes similar to those expressed by bacteria. Mitochondria also have special ribosomes and transfer RNAs that resemble these components in prokaryotes. These features all support that mitochondria were once free-living prokaryotes.

Chloroplasts

Chloroplasts are one type of plastid, a group of related organelles in plant cells that are involved in the storage of starches, fats, proteins, and pigments. Chloroplasts contain the green pigment chlorophyll and play a role in photosynthesis. Genetic and morphological studies suggest that plastids evolved from the endosymbiosis of an ancestral cell that engulfed a photosynthetic cyanobacterium. Plastids are similar in size and shape to cyanobacteria and are enveloped by two or more membranes, corresponding to the inner and outer membranes of cyanobacteria. Like mitochondria, plastids also contain circular genomes and divide by a process reminiscent of prokaryotic cell division. The chloroplasts of red and green algae exhibit DNA sequences that are closely related to photosynthetic cyanobacteria, suggesting that red and green algae are direct descendants of this endosymbiotic event.

Mitochondria likely evolved before plastids because all eukaryotes have either functional mitochondria or mitochondria-like organelles. In contrast, plastids are only found in a subset of eukaryotes, such as terrestrial plants and algae. One hypothesis of the evolutionary steps leading to the first eukaryote is summarized in [link].

The exact steps leading to the first eukaryotic cell can only be hypothesized, and some controversy exists regarding which events actually took place and in what order. Spirochete bacteria have been hypothesized to have given rise to microtubules, and a flagellated prokaryote may have contributed the raw materials for eukaryotic flagella and cilia. Other scientists suggest that membrane proliferation and compartmentalization, not endosymbiotic events, led to the development of mitochondria and plastids. However, the vast majority of studies support the endosymbiotic hypothesis of eukaryotic evolution.

The early eukaryotes were unicellular like most protists are today, but as eukaryotes became more complex, the evolution of multicellularity allowed cells to remain small while still exhibiting specialized functions. The ancestors of today’s multicellular eukaryotes are thought to have evolved about 1.5 billion years ago.

Section Summary

The first eukaryotes evolved from ancestral prokaryotes by a process that involved membrane proliferation, the loss of a cell wall, the evolution of a cytoskeleton, and the acquisition and evolution of organelles. Nuclear eukaryotic genes appear to have had an origin in the Archaea, whereas the energy machinery of eukaryotic cells appears to be bacterial in origin. The mitochondria and plastids originated from endosymbiotic events when ancestral cells engulfed an aerobic bacterium (in the case of mitochondria) and a photosynthetic bacterium (in the case of chloroplasts). The evolution of mitochondria likely preceded the evolution of chloroplasts. There is evidence of secondary endosymbiotic events in which plastids appear to be the result of endosymbiosis after a previous endosymbiotic event.

Multiple Choice

What event is thought to have contributed to the evolution of eukaryotes?


Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.


Changing ideas about eukaryotic origins

The origin of eukaryotic cells is one of the most fascinating challenges in biology, and has inspired decades of controversy and debate. Recent work has led to major upheavals in our understanding of eukaryotic origins and has catalysed new debates about the roles of endosymbiosis and gene flow across the tree of life. Improved methods of phylogenetic analysis support scenarios in which the host cell for the mitochondrial endosymbiont was a member of the Archaea, and new technologies for sampling the genomes of environmental prokaryotes have allowed investigators to home in on closer relatives of founding symbiotic partners. The inference and interpretation of phylogenetic trees from genomic data remains at the centre of many of these debates, and there is increasing recognition that trees built using inadequate methods can prove misleading, whether describing the relationship of eukaryotes to other cells or the root of the universal tree. New statistical approaches show promise for addressing these questions but they come with their own computational challenges. The papers in this theme issue discuss recent progress on the origin of eukaryotic cells and genomes, highlight some of the ongoing debates, and suggest possible routes to future progress.

1. What did we think before?

In the rooted ‘three domains’ tree [1], the eukaryotic nuclear lineage is a deep branching sister group to the Archaea, implying that eukaryotes are as old as that group of prokaryotes (figure 1). The species at the base of eukaryotes in the three domains tree are parasites like Giardia and Microsporidia which lack classical mitochondria, in agreement with the hypothesis that they were descended from lineages (often called Archezoans—[2]) that diverged from other eukaryotes before the mitochondrial endosymbiosis. In the three domains tree, the eukaryotes—cells with a nucleus—existed before the mitochondrial endosymbiosis. The apparent agreement between phylogeny and cell biology made this version of early evolution compelling. Thus, although competing hypotheses were in circulation at the time [3–7], and many genes on eukaryotic genomes were already known to conflict with the three domains tree [8,9], it is the one that appeared in standard textbooks and works of popular science. A tree diagram is the single figure in the ‘Origin of Species’ [10, pp. 160–161] and so it was natural that there is only a single figure in an updated popular science version [11] of Darwin's classic. The tree chosen was an unrooted version of the three domains tree, depicting Archaea and eukaryotes as separate groups and with the Archezoans clearly labelled at the base of eukaryotes.

Figure 1. Competing hypotheses for the origin of eukaryotes. (a) In the textbook ‘three domains’ tree, the eukaryotes and Archaea are monophyletic sister groups, with each lineage as old as the other. (b) The ‘two domains’ view, supported by improved phylogenetic methods and taxonomic sampling. In this scenario, Bacteria and Archaea comprise the two primary cellular lineages, with eukaryotes formed in a symbiosis between them. Both trees are shown rooted on the branch leading to the Bacteria although, as discussed in §5, the analyses on which this root position is based must be interpreted with caution.

The papers in this theme issue describe and discuss how this view of eukaryotic evolution has radically changed over the past few years, and identify major ongoing controversies and challenges. The contributors sometimes offer very different perspectives on these issues, so there is principled disagreement as well as consensus. In part, this reflects not only the rapid and exciting progress being made but also the inherent difficulty of inferring ancient events from small amounts of incomplete data using imperfect methods, and the ambition and scale of the scientific questions that are being asked. Some of the most marked changes in thinking are about the nature of the host for the mitochondrial endosymbiont and the recognition that organelles related to mitochondria are ubiquitous among eukaryotes, including former Archezoans. These changes have removed a major line of evidence for the view that that the mitochondrial host was already a eukaryote and, in turn, have led to more serious consideration of hypotheses in which an Archaeon was the host for the mitochondrial endosymbiosis in founding the eukaryotic lineage. Debates about the role of the mitochondrial endosymbiont in eukaryotic genome evolution, and the evolution and diversity of contemporary mitochondrial homologues, including hydrogenosomes and mitosomes, are now major topics of investigation. The origins of genes and the extent of non-endosymbiotic lateral gene transfers in eukaryotic evolution are still controversial, but it is now clear that eukaryotes owe a major genomic debt to Archaea and Bacteria as well as possessing a previously under-appreciated talent for gene invention and innovation. Whether viruses have also played a role in eukaryotic origins and evolution is hotly debated, fuelled to a degree by recent discoveries of unexpectedly large and gene-rich DNA viruses.

Microbial ecologists have long known that cultured and studied microbes comprise only a small fraction of extant unicellular life, so it is to be expected that our understanding of cellular evolution has been limited by incomplete and biased sampling of natural microbial diversity. New metagenomic and single cell genome sequencing methods hold enormous promise to sample the hitherto unstudied majority of microbial life. As discussed in this issue, these methods have already identified new archaeal lineages that are more closely related to eukaryotes than any yet sampled, and that share genes previously thought to define important aspects of the biology of eukaryotic cells. Concerns about the accuracy of trees for inferring deep eukaryotic relationships or gene origins, which are often made using overly simple statistical models and short sequences, occupy a number of our contributors. The need to consider the fit between model and data, and to recognize that poor models will generally make poor trees, is an oft repeated and important cautionary message. Trees and networks of various sorts will continue to play a major role in studies aiming to investigate eukaryotic evolution and to disentangle vertical and horizontal descent, but existing methods are fraught with problems and the search for congruence between independent lines of evidence will always be important.

2. A new host for the mitochondrial endosymbiont

The three domains tree describes eukaryotes and Archaea as separate groups and has a fully formed eukaryotic cell as the host for the mitochondrial endosymbiont [1]. However, at the same time as some analyses were recovering the three domains tree, other analyses (reviewed in [12]) of the same data but often using better methods were supporting another hypothesis called the ‘eocyte tree’ [3]. In the ‘eocyte tree’, eukaryotes originate from within the Archaea as the sister group of species like Sulfolobus—which James Lake [13] classified within a separate kingdom called Eocyta or ‘dawn cells’ [3], and which Woese et al. later named the Crenarchaeota [1]. Support for the eocyte tree has continued to accumulate in recent years with improved evolutionary models and wider sampling of environmental Archaea [12,14,15]. Thus, analyses of universal core genes using better-fitting models place eukaryotes within the diversity of Archaea, branching with a group called the ‘TACK’ superphylum which contains the lineages Thaumarchaeota, Aigarchaeota, Crenarchaeota and Korarchaeota [16–19]. As eocytes were originally defined phylogenetically as the sister group of eukaryotes [3], these new trees are consistent with the eocyte hypothesis. Our special issue opens with a personal perspective by James Lake [13] describing the genesis and development of the eocyte hypothesis and other seminal contributions, including his highly original ‘ring of life’ hypothesis that invokes large gene flows as major drivers in eukaryotic evolution. This ‘ring of life’ is also the focus of the paper by McInerney et al. [20], who argue that it is the best-supported and most general hypothesis to explain the different types of data that speak to eukaryotic origins.

If the trees that place the origin of the eukaryotic nuclear lineage within the Archaea are correct, then we should expect to find new species that are more similar to eukaryotes at the level of genes and proteins. Eugene Koonin [21] discusses recent data that are consistent with this hypothesis and demonstrates how understanding archaeal genome evolution is important for understanding early eukaryotic evolution. Consistent with the predictions of recent phylogenomic analyses, prokaryotic homologues of key eukaryotic componentry, including genes involved in the cytoskeleton and ubiquitin-mediated protein degradation, are found only among the TACK Archaea. But Koonin [21] also shows that homologues of other signature eukaryotic genes, including components of the cell division, membrane remodelling, and RNA interference machineries, have a patchy distribution across the sequenced diversity of Archaea, suggesting a complex history of gene loss and potentially horizontal transfer throughout archaeal evolution. As mentioned in §1, limited and potentially biased sampling of natural microbial diversity may limit our inferences of early evolution. The paucity of genomes is particularly acute for Archaea because the exploration of this domain has traditionally lagged behind Bacteria and eukaryotes. This situation is rapidly changing because of advances in single cell and metagenomic approaches that now enable the genomes of uncultured microbes to be sequenced directly from the environment [22]. The most spectacular finding to date has been that of the Lokiarchaeota, an archaeal lineage that appears to contain the closest relatives of eukaryotes discovered so far [23,24]. Consistent with its sister group relationship to eukaryotes, Lokiarchaeota have more eukaryotic signature genes than any other Archaea yet described [23]. Saw et al. [25] describe the methods they used to sequence and assemble the genome of Lokiarchaeum and other uncultured members of the TACK group, and the implications of the Lokiarchaeota gene repertoires for the origins of key eukaryotic features such as the cytoskeleton, membrane remodelling and phagocytosis. This final trait has often been argued [26] to be a key ability of the ancestral host cell that acquired the mitochondrial endosymbiont. Intriguingly for theories of eukaryogenesis, the ESCRT machinery—found in eukaryotes as well as in Lokiarchaeota and some other TACK Archaea—has recently been shown to regulate the reformation of the nuclear envelope after mitosis [27].

3. Endosymbiosis, mitochondrial homologues and the origins of bacterial genes on eukaryotic genomes

The rejection of the Archezoa hypothesis, and the discovery of mitochondrial homologues in parasites and anaerobes that were previously thought to primitively lack them [28], has stimulated interest in ideas that propose that the mitochondrial endosymbiosis was an ancestral event in eukaryotic evolution. It has also focused attention on the mitochondrial endosymbiont as the source of some, perhaps many, of the bacterial genes on eukaryotic genomes. Two contributions to our issue present differing perspectives on some of these questions. Martin et al. [29] provide a detailed and beautifully illustrated discussion of endosymbiotic hypotheses for eukaryotic origins, arguing that those involving an autotrophic Archaeon and the mitochondrial endosymbiont fit current data better than alternative hypotheses. Stairs et al. [30] focus on the origins of metabolic diversity among the mitochondrial homologues—organelles sharing common ancestry with mitochondria including hydrogenosomes and mitosomes—that have been discovered in anaerobic and parasitic protists from across the eukaryotic tree. They suggest that horizontal gene transfer (HGT) outside of endosymbiosis may be an important source of genes for these diverse metabolisms and that convergence driven by HGT and common ecology is a recurring feature of mitochondrial evolution.

Some of this debate reflects the difficulties in achieving robust conclusions from weakly supported gene trees compounded by patchy sampling, and the differences in opinion about ancestral gene content and the degree to which the genome of the mitochondrial endosymbiont was itself chimaeric [31]. HGT appears to be a powerful force shaping the genome evolution of modern Bacteria and there is no particular reason to suppose that ancient Bacteria were any different.

The impact of horizontal transfer on eukaryotic genomes is highly relevant because a high proportion of genes on eukaryotic genomes appear to originate from Bacteria [8,9,32–34]. Some have suggested that most of these genes are derived from ancient endosymbionts [32], whereas others have advocated continual gene flow from diverse donors over time [33]. There is good evidence for both sources (reviewed in [32,35–37]), but disagreement about their relative importance [38,39]. Katz [40] presents an analysis of patterns of gene presence and absence in the context of an extremely broad sampling of eukaryotic diversity to identify candidate prokaryote to eukaryote HGT. Her analyses identify over a thousand transfers into eukaryotes, but most are restricted to one or a few closely related genomes. This is interpreted as evidence that HGT is an ongoing process, but that most detectable events are recent and, with the exception of the genes originating from the mitochondrial and plastid endosymbionts, that relatively few transferred genes have persisted from the earliest period of eukaryotic evolution. These data suggest an interesting parallel between HGT and other processes of genome evolution such as point mutation, gene- and whole-genome duplication, in which most new genetic material is quickly lost unless maintained by positive selection [41,42].

Although horizontal transfer is generally held to be more frequent in prokaryotes than eukaryotes [35], few direct comparisons have been performed. The contribution of Szöllősi et al. [43] addresses this issue. The authors present a case study of gene transfer dynamics in fungi and cyanobacteria, exemplars of eukaryotic and prokaryotic groups for which abundant genome data are available. Their analyses make use of phylogenetic profiles as well as gene tree–species tree reconciliation methods to detect and map transfer events throughout the evolutionary history of both groups. The results suggest that rates of gene transfer in these groups are broadly similar, providing some support for the idea that the importance and dynamics of HGT may be qualitatively similar among prokaryotes and eukaryotes. This result, if found to hold more generally, would suggest an ongoing flux of bacterial genes into eukaryotic genomes from a variety of sources in addition to the large-scale gains associated with ancestral endosymbioses.

4. Eukaryotic genome evolution from within

Eukaryotic genomes encode a significant fraction—as much as 63% according to recent analyses of the yeast genome [44]—of eukaryote-specific genes that underpin key aspects of eukaryotic biology. Traditional models for eukaryotic gene origins emphasized the duplication and functional divergence of pre-existing genes [45], but there is increasing evidence that the de novo origin of new genes from noncoding sequence is also important. McLysaght & Guerzoni [46] provide an overview of these data and provide interesting examples from across the eukaryotic tree, some of which are functionally important and subject to positive selection. Evidence for widespread de novo gene origination in modern eukaryotes provides a plausible mechanism by which eukaryote-specific genes could have evolved in the nascent eukaryotic stem lineage during the origin of eukaryotes.

One of the most distinctive features of eukaryotic genomes in comparison to prokaryotes is the preponderance of noncoding sequence, which in many lineages outweighs or even dwarfs the quantity of coding DNA. While much of this excess material is probably selfish or non-functional [47], high-profile debate currently rages over the extent to which noncoding elements contribute to eukaryotic phenotypic complexity by regulating the expression of coding sequences [48–52]. Elliott & Gregory [53] contribute to this debate by providing new insights into the relationships between genome size, coding capacity, repetitive content and other genomic parameters from the largest survey of eukaryotic genome diversity to date. Their data underline striking differences between the streamlined, gene-rich genomes of prokaryotes and the large, highly repetitive genomes of many eukaryotes. These differences may arise from the fundamental changes in the population genetic environment that accompanied the origin of eukaryotes, ranging from increased cell size (and concomitant reduction in population densities) to the evolution of meiosis and sex. The relative contributions of genetic drift [54], mutation [55] and selection [56,57]—perhaps at multiple levels [58]—to the origin and evolution of eukaryotes and their genomes remains a fascinating area of debate, and broad comparative data of the type presented by Elliott & Gregory [53] will continue to play an important role in contrasting the predictions of the leading hypotheses.

5. How good are our methods for inferring the past?

Much of the progress discussed in this volume has been facilitated by the increasing ease with which whole genomes and transcriptomes can be sequenced, even for uncultured organisms. In principle, obtaining representative sampling is no longer a major hurdle, but the increasing rate of data generation has largely outstripped the computational power needed to analyse it. This has created a situation where undesirable trade-offs are made between dataset size and model adequacy, and this is hindering progress. Better phylogenetic models are already available that recognize that the evolutionary process is complex and may change over time and between species, but they come with a cost of increased analysis time and hence cannot be used for large numbers of species. As improved taxonomic sampling is already known to affect the accuracy of phylogenetic reconstructions [59], improving the scalability of complex methods to handle more data is highly desirable. Nicolas Lartillot [60] provides an overview of these issues and highlights potential solutions to some of the outstanding problems. Bayesian approaches provide a natural framework for fitting more complex and biologically motivated models to genome data, but Lartillot [60] argues that future progress may depend on the development of alternatives to standard Markov Chain Monte Carlo (MCMC) algorithms. MCMC has underpinned the successes of Bayesian phylogenetics to date, but the technique is now 50 years old and can struggle to achieve convergence on large-scale genomic datasets, even with continuing advances in computational power.

Probabilistic supertrees [61] synthesize information from a set of input gene trees to infer an overall species tree while allowing for some disagreement between the histories of the individual genes, whether due to horizontal transfer or more prosaic sources of phylogenetic error. They, therefore, represent an interesting and potentially very valuable ‘middle ground’ between the complex, hierarchical models of gene and genome evolution described by Szöllősi et al. [43] and Lartillot [60] and the simpler ‘supermatrix’ or concatenation approaches that have frequently been used to investigate the evolutionary history of genomes and species. Early supertree methods based on parsimony are known to have problems, so Akanni et al. [62] used a recently developed Bayesian probabilistic supertree method in their contribution. Their analysis evaluates the evidence for large-scale gene flows from Bacteria into archaeal genomes. Recently published work has argued that large gene flows have been an important factor in the evolution and ecology of Archaea [63,64]. While the supertree they recover for Archaea suggests a strong vertical signal, composite trees including Archaea and Bacteria were poorly resolved for deeper nodes, which Akanni et al. [62] suggest results from a mixture of vertical and horizontal signals, consistent with published work claiming episodic inter-domain transfer. These are intriguing results that raise interesting questions about the different effects of HGT on Bacteria and Archaea, and why these two prokaryotic groups should behave differently. It also suggests that the archaeal host lineage that merged with the mitochondrial endosymbiont might have been similarly chimaeric in terms of its genome content.

The limited reliability of single gene trees inferred using overly simple methods is at the core of a number of contributions to this issue. Moreira & López-García [65] discuss how better trees have been used to evaluate proposals that viruses have played a key role in eukaryotic origins. These ideas were originally prompted by the discovery of the Megaviridae, giant amoeba-infecting viruses whose unexpectedly large genomes (1–2.5 Mbp, comparable in size with many cellular genomes) encode homologues of core components of the eukaryotic DNA replication and translation machineries [66–68]. Viral homologues branched outside the eukaryotic clade in early trees, suggesting that an ancient Megavirus, perhaps part of a ‘fourth domain’ of life, might have donated these genes to the ancestral eukaryote [67]. Moreira & López-García [65] note that placing viruses in phylogenetic trees is exceptionally challenging because of their high rates of sequence evolution, which—not unlike the deep divergences between the cellular domains—can induce artefacts such as long-branch attraction, the spurious grouping of fast-evolving sequences due to chance convergences in the substitution process. Their new analyses, in combination with a review of recent work, lead them to suggest that the presence of eukaryotic genes on viral genomes is best explained by horizontal acquisition from their eukaryotic hosts. They conclude that there is no compelling support for a viral contribution to the origin of eukaryotes or for the hypothesis that viruses represent a primaeval fourth domain of life.

Many of the contributions in the volume favour hypotheses that have prokaryotes first and eukaryotes as a derived group formed through a merger involving Archaea and Bacteria. This prokaryote to eukaryote polarization of cellular evolution is consistent with published data using ancient paralogues and phylogenetic networks to root the universal tree on the bacterial stem [1,69–72]. It is also consistent with—albeit patchy and incomplete—fossil evidence for prokaryotes and prokaryotic metabolism more than a billion years before the earliest eukaryotic fossils [12,73,74], and with the observation that all known eukaryotes have a mitochondrial homologue—implying that the origin of alpha-proteobacteria occurred before the radiation of known eukaryotes [28]. Nevertheless, the trees used for paralogue rooting were inferred using overly simple phylogenetic methods that are known to be unreliable for reconstructing ancient events [5,75], leaving room for criticism and debate. As a result, hypotheses that eukaryotes, or at least cells carrying much of the complexity that we associate with eukaryotes, might pre-date prokaryotes have persisted in the literature [6,7]. In these ‘eukaryotes first’ or ‘eukaryotes early’ scenarios, all three groups of cellular life are either held to have arisen contemporaneously, or prokaryotes are proposed to have originated through simplification of a complex ancestor that possessed many of the features that persist in modern eukaryotes [76–78]. Mariscal & Doolittle [79] provide a lucid historical overview of ‘eukaryotes first’ scenarios, examining their original motivations and discussing how they have fared as new data have accumulated. Their contribution brings clarity to a confusing and sometimes contradictory literature and, importantly, it attempts to clarify what is meant by ‘eukaryotes’ in ‘eukaryotes first’ and to identify how these ideas might be tested.

Gouy et al. [80] tackle the question of what came first from a methodological perspective, questioning whether alternatives to the bacterial root depicted in universal trees (figure 1) can really be rejected, given the limitations of the models used to recover it [69–71]. They suggest that the use of better models and more careful attention to the properties of data are needed to re-evaluate the root position, and we firmly agree that this is urgently needed. In particular, the inference that eukaryotes branch within Archaea presumes a root outside of those two groups—a tenuous assumption, according to Gouy et al. [80]. They also argue that the preference for the bacterial root is influenced by a persistent bias that favours simple to complex evolutionary scenarios, an unhelpful progressivist attitude that is also criticized by Mariscal & Doolittle [79].

The question of how best to root phylogenetic trees is an outstanding one at all levels of the taxonomic hierarchy, with the recent controversy about the root of the eukaryotic tree providing another important example [81–83]. Most of the published tree-based methods for rooting rely on outgroup rooting. This has well-known problems, because the outgroup is often highly divergent from the ingroup, and this makes analyses susceptible to the well-known long-branch artefact that has bedevilled work on early evolution, as discussed by a number of our contributors. As an alternative to outgroup rooting, Williams et al. [84] evaluate the potential of non-reversible and non-stationary substitution models, which infer the root of the tree as an integral part of the analysis. These are models in which the probability of the tree depends on the starting point of the substitutional process, so that the inferred trees are rooted. These methods have previously shown promise [85,86], but have not been applied more generally because of the additional computational burden of model fitting in comparison to standard models. Two recently described models were applied to infer the root of the universal tree and obtained a root either within the bacterial domain or on the branch separating the Bacteria and Archaea, providing some support for prokaryotes-first hypotheses and suggesting that gene sequences contain a rooting signal that can be extracted. However, as with the methods discussed by Lartillot [60] and Gouy et al. [80], current implementations are slow, limiting the size of the datasets that can be analysed—a serious difficulty given the established importance of broad taxonomic sampling for inferring phylogenetic trees [59] and the models, while promising, are by no means consummate.

6. Some concluding remarks

Inferring ancient events from small amounts of data using methods that are not completely up to the job is unlikely to be error-free, and some views will no doubt change again. Nonetheless, the papers in this theme issue—and those in another recent collection [87]—testify to an era of remarkable excitement in the field of eukaryotic origins. The debate about the relative importance of non-endosymbiotic gene transfer, and bulk versus continual transfer hypotheses as a source(s) of prokaryotic genes on eukaryotic and archaeal genomes is particularly vibrant. Some of the discussion is fuelled by the inherent difficulties in trying to infer events from trees that are poorly resolved, because of saturation and other complexities of gene evolution, and also because of still limited sampling of microbial diversity. Nevertheless, it is very clear, and has been for some time [8,9,88], that widespread HGT means that no single tree can depict the history of all genes on prokaryotic or eukaryotic genomes. Trees and non-tree-based methods like networks will continue to be complementary and synergistic approaches for analysing how genomes evolve.

One area that is particularly exciting is the exploration of uncultured microbial diversity, which has the potential to hone in on the closest extant relatives of the mitochondrial endosymbiont [89] and of the proposed archaeal host lineage [23] and provide an experimental framework for testing currently favoured hypotheses. Those partners in early eukaryotic evolution, like all ancestors, are long extinct—but better sampling of their modern relatives can help to improve trees and to refine inferences about the gene content and cellular features of our prokaryotic ancestors. The discovery of the Lokiarchaeota, with their enhanced content of genes previously thought to be eukaryotic specific, is a particularly exciting discovery and provides evidence that phylogenetic methods, however imperfect, can be used to infer ancient relationships [24]. But sequence data can only take us so far and a major challenge now is to isolate Lokiarchaeota and other relevant environmental lineages into culture so that the cellular manifestation of their genome content—their biology and physiology—can actually be studied in the laboratory.


Background

The emergence of the eukaryotic cell with its nucleus, endomembrane system, and membrane-bound organelles represented a quantum leap in complexity beyond anything seen in prokaryotes [1]-[3]. The sophisticated cellular compartmentalization and the symbiotic association with mitochondria are thought to have enabled eukaryotes to adopt new ecological roles and provided a precursor to numerous successful origins of multicellularity. Nevertheless, despite being recognized as the single most profound evolutionary transition in cellular organization, the origins of the eukaryotic cell remain poorly understood.

The key events in the evolution of eukaryotes were the acquisition of the nucleus, the endomembrane system, and mitochondria. It is now established beyond reasonable doubt that mitochondria are derived from endosymbiotic α-proteobacteria [4]-[6]. Existing models for the origin of eukaryotes generally agree that proto-mitochondria entered the cell via phagocytosis. Likewise, the most widely favored models for the origins of the nucleus assume that it was formed within a prokaryotic cell as the result of invaginations of the plasma membrane - whether by phagocytosis of an endosymbiont that corresponds to the nuclear compartment or by the internalization of membranes that became organized around the chromatin (reviewed in [7] and discussed further below). Thus, existing theories for the origin of eukaryotes share the assumption that the nucleus is a novel structure formed within the boundaries of an existing, and largely unaltered, plasma membrane [8] - they are outside-in models.

Here, we set out to challenge the outside-in perspective. Archaea often generate extracellular protrusions [9]-[14], but are not known to undergo processes akin to endocytosis or phagocytosis. Therefore, we suggest that eukaryotic cell architecture arose as the result of membrane extrusion. In brief, we propose that eukaryotes evolved from a prokaryotic cell with a single bounding membrane that extended extracellular protrusions that fused to give rise to the cytoplasm and endomembrane system. Under this inside-out model, the nuclear compartment, equivalent to the ancestral prokaryotic cell body, is the oldest part of the cell and remained structurally intact during the transition from prokaryotic to eukaryotic cell organization.

The inside-out model provides a simple stepwise path for the evolution of eukaryotes, which, we argue, fits the existing data at least as well as any current theory. Further, it sheds new light on previously enigmatic features of eukaryotic cell biology, including those that led others to suggest the need to revise current cell theory [15]. Given the large number of testable predictions made by our model, and its potential to stimulate new empirical research, we argue that the inside-out model deserves consideration as a new theory for the origin of eukaryotes.

Overview of existing models of eukaryotic cell evolution

Endosymbiotic, outside-in models explain the origin of the nucleus and mitochondria as being the result of sequential rounds of phagocytosis and endosymbiosis. These models invoke three partners - host, nucleus, and mitochondria - and envisage the nuclear compartment being derived from an endosymbiont that was engulfed by a host cell. Authors have suggested that the host (that is, cytoplasm) could be an archaeon [16]-[18], a proteobacterium [19]-[21], or a bacterium of the Planctomycetes, Verrucomicrobia, Chlamydiae (PVC) superphylum [22]. The endosymbiont (that is, the nucleus) has been proposed to have been an archaeon [19]-[22], a spirochete [16], or a membrane-bound virus [17],[18]. In general, endosymbiotic models are agnostic as to whether mitochondria were acquired before or after the nucleus. An exception to this is the syntrophic consortium model, which envisages the simultaneous fusion of a symbiotic community composed of all three partners: cytoplasm, nucleus, and mitochondria [23],[24]. A more divergent `endosymbiotic' model is the endospore model [25]. This holds that the nucleus evolved when a cell enclosed its sister after cell division, similar to the way in which endospores are formed in certain Gram-positive bacteria. However, there is no evidence of endospore formation or other engulfment processes in Archaea, making this hypothesis improbable.

Recent phylogenomic analyses have revealed that the eukaryotic genome likely represents a combination of two genomes, one archaeal [26],[27] and one proteobacterial [28],[29]. There is no evidence to support any additional, major genome donor as expected under nuclear endosymbiotic models [30]. Furthermore, endosymbiotic models (including the endospore model) require supplemental theories to explain the origin of the endomembrane system, the physical continuity of inner and outer nuclear membranes, and the formation of nuclear pores. In light of these facts, we do not think that endosymbiosis provides a convincing explanation for the origin of the nuclear compartment [2],[7],[31]-[33].

Given the problems with endosymbiotic models, we believe that the most compelling current models for the origin of eukaryotes are those that invoke an autogenous origin of the nucleus. These usually suggest that a prokaryotic ancestor evolved the ability to invaginate membranes to generate internal membrane-bound compartments, which became organized around chromatin to generate a nucleus [32],[34]-[36]. In some models, infoldings of the plasma membrane were pinched off to form endoplasmic reticulum (ER)-like internal compartments that later became organized around the chromatin to form the inner and outer nuclear envelope [35],[37]-[39]. Alternatively, the nuclear membranes could be seen as arising from invaginations of the plasma membrane, so that the early eukaryote cell had an ER and nuclear envelope that were continuous with the outer cell membrane [40]. In either case, under these models the nuclear membrane is ultimately derived from internalized plasma membrane.

Older autogenous outside-in models generally proposed that mitochondria were acquired by a cell that already had a nucleus [32],[34],[35] - in line with the results of early phylogenetic studies [41]. More recent phylogenetic data have suggested that mitochondria were present in the last eukaryotic common ancestor [42],[43]. This has led to the formulation of new autogenous models in which the acquisition of mitochondria predates the formation of the nuclear compartment [1],[23],[44]-[46].

Overview of the inside-out model

In following sections we outline a series of simple evolutionary steps from a prokaryotic to a fully eukaryotic cell structure, driven primarily by selection for an increasingly intimate mutualistic association between an archaeal host cell and α-proteobacteria (proto-mitochondria), which initially lived on the host cell surface (Figure 1). Under the inside-out hypothesis, the outer nuclear membrane, plasma membrane, and cytoplasm were derived from extracellular protrusions (blebs), whereas the ER represents the spaces between blebs (Table 1). Mitochondria were initially trapped in the ER, but later penetrated the ER membrane to enter the cytoplasm proper. Under the inside-out model, the final step in eukaryogenesis was the formation of a continuous plasma membrane, which closed off the ER from the exterior.

Inside-out model for the evolution of eukaryotic cell organization. Model showing the stepwise evolution of eukaryotic cell organization from (A) an eocyte ancestor with a single bounding membrane and a glycoprotein rich cell wall (S-layer) interacting with epibiotic α-proteobacteria (proto-mitochondria). (B) We envision the eocyte cell forming protrusions, aided by protein-membrane interactions at the protrusion neck. These protrusions facilitated material exchange with proto-mitochondria. (C) Selection for a greater area of contact between the symbionts would have led to bleb enlargement and the eventual loss of the S-layer from the protrusions. (D) Blebs would have then been further stabilized by the development of a symmetric nuclear pore outer ring complex (Figure 2) and through the establishment of LINC complexes that, following the gradual loss of the S-layer, physically connected the original cell body (the nascent nuclear compartment) to the inner bleb membranes. (E) With the expansion of blebs to enclose the proto-mitochondria, a process that would have facilitated the acquisition of bacterial lipid biosynthesis machinery by the host, the site of cell growth would have progressively shifted to the cytoplasm, facilitated by the development of regulated traffic through the nuclear pore. At the same time, the spaces between blebs would have enabled the gradual maturation of proteins secreted into the environment via the perinuclear space through glycosylation and proteolytic cleavage. (F) Finally, bleb fusion would have connected cytoplasmic compartments and driven the formation of an intact plasma membrane, perhaps through a process akin to phagocytosis whereby one bleb enveloped the whole. This simple topological transition would have isolated the endoplasmic reticulum from the outside world, driven the full development of a system of vesicular trafficking, and established strict vertical transmission of mitochondria, leading to a cell with modern eukaryotic cell organization.

Only one other paper that we are aware of has proposed that the nuclear compartment corresponds to boundaries of an ancestral cell. The exomembrane hypothesis of de Roos [47] is, however, quite distinct from the model put forward here. De Roos postulated that the starting point was a proto-eukaryote with a double membrane that secreted membranous extracellular vesicles that fused to form an enclosing plasma membrane. Moreover, his model relies on an unconventional view of evolutionary history, including an independent origin of eukaryotic and prokaryotic cells. Thus, we will not discuss the exomembrane hypothesis further.

In the following sections, we describe the inside-out model in detail. We discuss the cellular processes involved in the generation of the cytoplasmic compartment, the vesicle trafficking system and plasma membrane, and cilia and flagella. In each section we point to relevant selective drivers and supporting evidence. Finally, we look at some of the implications and testable predictions of the model and conclude by reflecting on the prospects for determining which of the models, inside-out or outside-in, is more likely to be correct.


On the origin of mitosing cells

A theory of the origin of eukaryotic cells ("higher" cells which divide by classical mitosis) is presented. By hypothesis, three fundamental organelles: the mitochondria, the photosynthetic plastids and the (9+2) basal bodies of flagella were themselves once free-living (prokaryotic) cells. The evolution of photosynthesis under the anaerobic conditions of the early atmosphere to form anaerobic bacteria, photosynthetic bacteria and eventually blue-green algae (and protoplastids) is described. The subsequent evolution of aerobic metabolism in prokaryotes to form aerobic bacteria (protoflagella and protomitochondria) presumably occurred during the transition to the oxidizing atmosphere. Classical mitosis evolved in protozoan-type cells millions of years after the evolution of photosynthesis. A plausible scheme for the origin of classical mitosis in primitive amoeboflagellates is presented. During the course of the evolution of mitosis, photosynthetic plastids (themselves derived from prokaryotes) were symbiotically acquired by some of these protozoans to form the eukaryotic algae and the green plants. The cytological, biochemical and paleontological evidence for this theory is presented, along with suggestions for further possible experimental verification. The implications of this scheme for the systematics of the lower organisms is discussed.


Contents

Genes may overlap in a variety of ways and can be classified by their positions relative to each other. [2] [6] [7] [8] [9]

  • Unidirectional or tandem overlap: the 3' end of one gene overlaps with the 5' end of another gene on the same strand. This arrangement can be symbolized with the notation → → where arrows indicate the reading frame from start to end.
  • Convergent or end-on overlap: the 3' ends of the two genes overlap on opposite strands. This can be written as → ←.
  • Divergent or tail-on overlap: the 5' ends of the two genes overlap on opposite strands. This can be written as ← →.

Overlapping genes can also be classified by phases, which describe their relative reading frames: [2] [6] [7] [8] [9]

  • In-phase overlap occurs when the shared sequences use the same reading frame. This is also known as "phase 0". Unidirectional genes with phase 0 overlap are not considered distinct genes, but rather as alternative start sites of the same gene.
  • Out-of-phase overlaps occurs when the shared sequences use different reading frames. This can occur in "phase 1" or "phase 2", depending on whether the reading frames are offset by 1 or 2 nucleotides. Because a codon is three nucleotides long, an offset of three nucleotides is an in-phase, phase 0 frame.

Overlapping genes are particularly common in rapidly evolving genomes, such as those of viruses, bacteria, and mitochondria. They may originate in three ways: [10]

  1. By extension of an existing open reading frame (ORF) downstream into a contiguous gene due to the loss of a stop codon
  2. By extension of an existing ORF upstream into a contiguous gene due to loss of an initiation codon
  3. By generation of a novel ORF within an existing one due to a point mutation.

The use of the same nucleotide sequence to encode multiple genes may provide evolutionary advantage due to reduction in genome size and due to the opportunity for transcriptional and translational co-regulation of the overlapping genes. [7] [11] [12] [13] Gene overlaps introduce novel evolutionary constraints on the sequences of the overlap regions. [9] [14]

Origins of new genes Edit

In 1977, Pierre-Paul Grassé proposed that one of the genes in the pair could have originated de novo by mutations to introduce novel ORFs in alternate reading frames he described the mechanism as overprinting. [15] : 231 It was later substantiated by Susumu Ohno, who identified a candidate gene that may have arisen by this mechanism. [16] Some de novo genes originating in this way may not remain overlapping, but subfunctionalize following gene duplication, [3] contributing to the prevalence of orphan genes. Which member of an overlapping gene pair is younger can be identified bioinformatically either by a more restricted phylogenetic distribution, or by less optimized codon usage. [4] [17] [18] Younger members of the pair tend to have higher intrinsic structural disorder than older members, but the older members are also more disordered than other proteins, presumably as a way of alleviating the increased evolutionary constraints posed by overlap. [17] Overlaps are more likely to originate in proteins that already have high disorder. [17]

Overlapping genes occur in all domains of life, though with varying frequencies. They are especially common in viral genomes.

Viruses Edit

The existence of overlapping genes was first identified in viruses the first DNA genome ever sequenced, of the bacteriophage ΦX174, contained several examples. [19] Another example is the ORF3d gene in the SARS-CoV 2 virus. [1] [21] Overlapping genes are particularly common in viral genomes. [4] Some studies attribute this observation to selective pressure toward small genome sizes mediated by the physical constraints of packaging the genome in a viral capsid, particularly one of icosahedral geometry. [22] However, other studies dispute this conclusion and argue that the distribution of overlaps in viral genomes is more likely to reflect overprinting as the evolutionary origin of overlapping viral genes. [23] Overprinting is a common source of de novo genes in viruses. [18]

Studies of overprinted viral genes suggest that their protein products tend to be accessory proteins which are not essential to viral proliferation, but contribute to pathogenicity. Overprinted proteins often have unusual amino acid distributions and high levels of intrinsic disorder. [24] In some cases overprinted proteins do have well-defined, but novel, three-dimensional structures [25] one example is the RNA silencing suppressor p19 found in Tombusviruses, which has both a novel protein fold and a novel binding mode in recognizing siRNAs. [18] [20] [26]

Prokaryotes Edit

Estimates of gene overlap in bacterial genomes typically find that around one third of bacterial genes are overlapped, though usually only by a few base pairs. [7] [27] [28] Most studies of overlap in bacterial genomes find evidence that overlap serves a function in gene regulation, permitting the overlapped genes to be transcriptionally and translationally co-regulated. [7] [13] In prokaryotic genomes, unidirectional overlaps are most common, possibly due to the tendency of adjacent prokaryotic genes to share orientation. [7] [9] [6] Among unidirectional overlaps, long overlaps are more commonly read with a one-nucleotide offset in reading frame (i.e., phase 1) and short overlaps are more commonly read in phase 2. [28] [29] Long overlaps of greater than 60 base pairs are more common for convergent genes however, putative long overlaps have very high rates of misannotation. [30] Robustly validated examples of long overlaps in bacterial genomes are rare in the well-studied model organism Escherichia coli, only four gene pairs are well validated as having long, overprinted overlaps. [31]

Eukaryotes Edit

Compared to prokaryotic genomes, eukaryotic genomes are often poorly annotated and thus identifying genuine overlaps is relatively challenging. [18] However, examples of validated gene overlaps have been documented in a variety of eukaryotic organisms, including mammals such as mice and humans. [32] [33] [34] [35] Eukaryotes differ from prokaryotes in distribution of overlap types: while unidirectional (i.e., same-strand) overlaps are most common in prokaryotes, opposite or antiparallel-strand overlaps are more common in eukaryotes. Among the opposite-strand overlaps, convergent orientation is most common. [33] Most studies of eukaryotic gene overlap have found that overlapping genes are extensively subject to genomic reorganization even in closely related species, and thus the presence of an overlap is not always well-conserved. [34] [36] Overlap with older or less taxonomically restricted genes is also a common feature of genes likely to have originated de novo in a given eukaryotic lineage. [34] [37] [38]

The precise functions of overlapping genes seems to vary across the domains of life but several experiments have shown that they are important for virus lifecycles through proper protein expression and stoichiometry [39] as well as playing a role in proper protein folding. [40] A version of bacteriophage ΦX174 has also been created where all gene overlaps were removed [41] proving they were not necessary for replication.


Contents

In the second half of the 19th century, Gregor Mendel's pioneering work on the inheritance of traits in pea plants suggested that specific “factors” (today established as genes) are responsible for transferring organismal traits between generations. [4] Although proteins were initially assumed to serve as the hereditary material, Avery, MacLeod and McCarty established a century later DNA, which had been discovered by Friedrich Miescher, as the carrier of genetic information. [5] These findings paved the way for research uncovering the chemical nature of DNA and the rules for encoding genetic information, and ultimately led to the proposal of the double-helical structure of DNA by Watson and Crick. [6] This three-dimensional model of DNA illuminated potential mechanisms by which the genetic information could be copied in a semiconservative manner prior to cell division, a hypothesis that was later experimentally supported by Meselson and Stahl using isotope incorporation to distinguish parental from newly synthesized DNA. [7] [8] The subsequent isolation of DNA polymerases, the enzymes that catalyze the synthesis of new DNA strands, by Kornberg and colleagues pioneered the identification of many different components of the biological DNA replication machinery, first in the bacterial model organism E. coli, but later also in eukaryotic life forms. [2] [9]

A key prerequisite for DNA replication is that it must occur with extremely high fidelity and efficiency exactly once per cell cycle to prevent the accumulation of genetic alterations with potentially deleterious consequences for cell survival and organismal viability. [10] Incomplete, erroneous, or untimely DNA replication events can give rise to mutations, chromosomal polyploidy or aneuploidy, and gene copy number variations, each of which in turn can lead to diseases, including cancer. [11] [12] To ensure complete and accurate duplication of the entire genome and the correct flow of genetic information to progeny cells, all DNA replication events are not only tightly regulated with cell cycle cues but are also coordinated with other cellular events such as transcription and DNA repair. [2] [13] [14] [15] Additionally, origin sequences commonly have high AT-content across all kingdoms, since repeats of adenine and thymine are easier to separate because their base stacking interactions are not as strong as those of guanine and cytosine. [16]

DNA replication is divided into different stages. During initiation, the replication machineries – termed replisomes – are assembled on DNA in a bidirectional fashion. These assembly loci constitute the start sites of DNA replication or replication origins. In the elongation phase, replisomes travel in opposite directions with the replication forks, unwinding the DNA helix and synthesizing complementary daughter DNA strands using both parental strands as templates. Once replication is complete, specific termination events lead to the disassembly of replisomes. As long as the entire genome is duplicated before cell division, one might assume that the location of replication start sites does not matter yet, it has been shown that many organisms use preferred genomic regions as origins. [17] [18] The necessity to regulate origin location likely arises from the need to coordinate DNA replication with other processes that act on the shared chromatin template to avoid DNA strand breaks and DNA damage. [2] [12] [15] [19] [20] [21] [22] [23]

More than five decades ago, Jacob, Brenner, and Cuzin proposed the replicon hypothesis to explain the regulation of chromosomal DNA synthesis in E. coli. [24] The model postulates that a diffusible, trans-acting factor, a so-called initiator, interacts with a cis-acting DNA element, the replicator, to promote replication onset at a nearby origin. Once bound to replicators, initiators (often with the help of co-loader proteins) deposit replicative helicases onto DNA, which subsequently drive the recruitment of additional replisome components and the assembly of the entire replication machinery. The replicator thereby specifies the location of replication initiation events, and the chromosome region that is replicated from a single origin or initiation event is defined as the replicon. [2]

A fundamental feature of the replicon hypothesis is that it relies on positive regulation to control DNA replication onset, which can explain many experimental observations in bacterial and phage systems. [24] For example, it accounts for the failure of extrachromosomal DNAs without origins to replicate when introduced into host cells. It further rationalizes plasmid incompatibilities in E. coli, where certain plasmids destabilize each other's inheritance due to competition for the same molecular initiation machinery. [25] By contrast, a model of negative regulation (analogous to the replicon-operator model for transcription) fails to explain the above findings. [24] Nonetheless, research subsequent to Jacob's, Brenner's and Cuzin's proposal of the replicon model has discovered many additional layers of replication control in bacteria and eukaryotes that comprise both positive and negative regulatory elements, highlighting both the complexity and the importance of restricting DNA replication temporally and spatially. [2] [26] [27] [28]

The concept of the replicator as a genetic entity has proven very useful in the quest to identify replicator DNA sequences and initiator proteins in prokaryotes, and to some extent also in eukaryotes, although the organization and complexity of replicators differ considerably between the domains of life. [29] [30] While bacterial genomes typically contain a single replicator that is specified by consensus DNA sequence elements and that controls replication of the entire chromosome, most eukaryotic replicators – with the exception of budding yeast – are not defined at the level of DNA sequence instead, they appear to be specified combinatorially by local DNA structural and chromatin cues. [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] Eukaryotic chromosomes are also much larger than their bacterial counterparts, raising the need for initiating DNA synthesis from many origins simultaneously to ensure timely replication of the entire genome. Additionally, many more replicative helicases are loaded than activated to initiate replication in a given cell cycle. The context-driven definition of replicators and selection of origins suggests a relaxed replicon model in eukaryotic systems that allows for flexibility in the DNA replication program. [29] Although replicators and origins can be spaced physically apart on chromosomes, they often co-localize or are located in close proximity for simplicity, we will thus refer to both elements as ‘origins’ throughout this review. Taken together, the discovery and isolation of origin sequences in various organisms represents a significant milestone towards gaining mechanistic understanding of replication initiation. In addition, these accomplishments had profound biotechnological implications for the development of shuttle vectors that can be propagated in bacterial, yeast and mammalian cells. [2] [41] [42] [43]

Most bacterial chromosomes are circular and contain a single origin of chromosomal replication (oriC). Bacterial oriC regions are surprisingly diverse in size (ranging from 250 bp to 2 kbp), sequence, and organization [45] [46] nonetheless, their ability to drive replication onset typically depends on sequence-specific readout of consensus DNA elements by the bacterial initiator, a protein called DnaA. [47] [48] [49] [50] Origins in bacteria are either continuous or bipartite and contain three functional elements that control origin activity: conserved DNA repeats that are specifically recognized by DnaA (called DnaA-boxes), an AT-rich DNA unwinding element (DUE), and binding sites for proteins that help regulate replication initiation. [17] [51] [52] Interactions of DnaA both with the double-stranded (ds) DnaA-box regions and with single-stranded (ss) DNA in the DUE are important for origin activation and are mediated by different domains in the initiator protein: a Helix-turn-helix (HTH) DNA binding element and an ATPase associated with various cellular activities (AAA+) domain, respectively. [53] [54] [55] [56] [57] [58] [59] While the sequence, number, and arrangement of origin-associated DnaA-boxes vary throughout the bacterial kingdom, their specific positioning and spacing in a given species are critical for oriC function and for productive initiation complex formation. [2] [45] [46] [60] [61] [62] [63] [64]

Among bacteria, E. coli is a particularly powerful model system to study the organization, recognition, and activation mechanism of replication origins. E. coli oriC comprises an approximately

260 bp region containing four types of initiator binding elements that differ in their affinities for DnaA and their dependencies on the co-factor ATP. DnaA-boxes R1, R2, and R4 constitute high-affinity sites that are bound by the HTH domain of DnaA irrespective of the nucleotide-binding state of the initiator. [47] [65] [66] [67] [68] [69] By contrast, the I, τ, and C-sites, which are interspersed between the R-sites, are low-affinity DnaA-boxes and associate preferentially with ATP-bound DnaA, although ADP-DnaA can substitute for ATP-DnaA under certain conditions. [70] [71] [72] [63] Binding of the HTH domains to the high- and low-affinity DnaA recognition elements promotes ATP-dependent higher-order oligomerization of DnaA's AAA+ modules into a right-handed filament that wraps duplex DNA around its outer surface, thereby generating superhelical torsion that facilitates melting of the adjacent AT-rich DUE. [53] [73] [74] [75] DNA strand separation is additionally aided by direct interactions of DnaA's AAA+ ATPase domain with triplet repeats, so-called DnaA-trios, in the proximal DUE region. [76] The engagement of single-stranded trinucleotide segments by the initiator filament stretches DNA and stabilizes the initiation bubble by preventing reannealing. [57] The DnaA-trio origin element is conserved in many bacterial species, indicating it is a key element for origin function. [76] After melting, the DUE provides an entry site for the E. coli replicative helicase DnaB, which is deposited onto each of the single DNA strands by its loader protein DnaC. [2]

Although the different DNA binding activities of DnaA have been extensively studied biochemically and various apo, ssDNA-, or dsDNA-bound structures have been determined, [56] [57] [58] [74] the exact architecture of the higher-order DnaA-oriC initiation assembly remains unclear. Two models have been proposed to explain the organization of essential origin elements and DnaA-mediated oriC melting. The two-state model assumes a continuous DnaA filament that switches from a dsDNA binding mode (the organizing complex) to an ssDNA binding mode in the DUE (the melting complex). [74] [77] By contrast, in the loop-back model, the DNA is sharply bent in oriC and folds back onto the initiator filament so that DnaA protomers simultaneously engage double- and single-stranded DNA regions. [78] Elucidating how exactly oriC DNA is organized by DnaA remains thus an important task for future studies. Insights into initiation complex architecture will help explain not only how origin DNA is melted, but also how a replicative helicase is loaded directionally onto each of the exposed single DNA strands in the unwound DUE, and how these events are aided by interactions of the helicase with the initiator and specific loader proteins. [2]

Archaeal replication origins share some but not all of the organizational features of bacterial oriC. Unlike bacteria, Archaea often initiate replication from multiple origins per chromosome (one to four have been reported) [79] [80] [81] [82] [83] [84] [85] [86] [46] yet, archaeal origins also bear specialized sequence regions that control origin function. [87] [88] [89] These elements include both DNA sequence-specific origin recognition boxes (ORBs or miniORBs) and an AT-rich DUE that is flanked by one or several ORB regions. [85] [90] ORB elements display a considerable degree of diversity in terms of their number, arrangement, and sequence, both among different archaeal species and among different origins within in a single species. [80] [85] [91] An additional degree of complexity is introduced by the initiator, Orc1/Cdc6 in archaea, which binds to ORB regions. Archaeal genomes typically encode multiple paralogs of Orc1/Cdc6 that vary substantially in their affinities for distinct ORB elements and that differentially contribute to origin activities. [85] [92] [93] [94] In Sulfolobus solfataricus, for example, three chromosomal origins have been mapped (oriC1, oriC2, and oriC3), and biochemical studies have revealed complex binding patterns of initiators at these sites. [85] [86] [95] [96] The cognate initiator for oriC1 is Orc1-1, which associates with several ORBs at this origin. [85] [93] OriC2 and oriC3 are bound by both Orc1-1 and Orc1-3. [85] [93] [96] Conversely, a third paralog, Orc1-2, footprints at all three origins but has been postulated to negatively regulate replication initiation. [85] [96] Additionally, the WhiP protein, an initiator unrelated to Orc1/Cdc6, has been shown to bind all origins as well and to drive origin activity of oriC3 in the closely related Sulfolobus islandicus. [93] [95] Because archaeal origins often contain several adjacent ORB elements, multiple Orc1/Cdc6 paralogs can be simultaneously recruited to an origin and oligomerize in some instances [94] [97] however, in contrast to bacterial DnaA, formation of a higher-order initiator assembly does not appear to be a general prerequisite for origin function in the archaeal domain. [2]

Structural studies have provided insights into how archaeal Orc1/Cdc6 recognizes ORB elements and remodels origin DNA. [97] [98] Orc1/Cdc6 paralogs are two-domain proteins and are composed of a AAA+ ATPase module fused to a C-terminal winged-helix fold. [99] [100] [101] DNA-complexed structures of Orc1/Cdc6 revealed that ORBs are bound by an Orc1/Cdc6 monomer despite the presence of inverted repeat sequences within ORB elements. [97] [98] Both the ATPase and winged-helix regions interact with the DNA duplex but contact the palindromic ORB repeat sequence asymmetrically, which orients Orc1/Cdc6 in a specific direction on the repeat. [97] [98] Interestingly, the DUE-flanking ORB or miniORB elements often have opposite polarities, [80] [85] [94] [102] [103] which predicts that the AAA+ lid subdomains and the winged-helix domains of Orc1/Cdc6 are positioned on either side of the DUE in a manner where they face each other. [97] [98] Since both regions of Orc1/Cdc6 associate with a minichromosome maintenance (MCM) replicative helicase, [104] [105] this specific arrangement of ORB elements and Orc1/Cdc6 is likely important for loading two MCM complexes symmetrically onto the DUE. [85] Surprisingly, while the ORB DNA sequence determines the directionality of Orc1/Cdc6 binding, the initiator makes relatively few sequence-specific contacts with DNA. [97] [98] However, Orc1/Cdc6 severely underwinds and bends DNA, suggesting that it relies on a mix of both DNA sequence and context-dependent DNA structural features to recognize origins. [97] [98] [106] Notably, base pairing is maintained in the distorted DNA duplex upon Orc1/Cdc6 binding in the crystal structures, [97] [98] whereas biochemical studies have yielded contradictory findings as to whether archaeal initiators can melt DNA similarly to bacterial DnaA. [93] [94] [107] Although the evolutionary kinship of archaeal and eukaryotic initiators and replicative helicases indicates that archaeal MCM is likely loaded onto duplex DNA (see next section), the temporal order of origin melting and helicase loading, as well as the mechanism for origin DNA melting, in archaeal systems remains therefore to be clearly established. Likewise, how exactly the MCM helicase is loaded onto DNA needs to be addressed in future studies. [2]

Origin organization, specification, and activation in eukaryotes are more complex than in bacterial or archaeal domains and significantly deviate from the paradigm established for prokaryotic replication initiation. The large genome sizes of eukaryotic cells, which range from 12 Mbp in S. cerevisiae to 3 Gbp in humans, necessitates that DNA replication starts at several hundred (in budding yeast) to tens of thousands (in humans) origins to complete DNA replication of all chromosomes during each cell cycle. [27] [36] With the exception of S. cerevisiae and related Saccharomycotina species, eukaryotic origins do not contain consensus DNA sequence elements but their location is influenced by contextual cues such as local DNA topology, DNA structural features, and chromatin environment. [110] [35] [37] Nonetheless, eukaryotic origin function still relies on a conserved initiator protein complex to load replicative helicases onto DNA during the late M and G1 phases of the cell cycle, a step known as origin licensing. [111] In contrast to their bacterial counterparts, replicative helicases in eukaryotes are loaded onto origin duplex DNA in an inactive, double-hexameric form and only a subset of them (10-20% in mammalian cells) is activated during any given S phase, events that are referred to as origin firing. [112] [113] [114] The location of active eukaryotic origins is therefore determined on at least two different levels, origin licensing to mark all potential origins, and origin firing to select a subset that permits assembly of the replication machinery and initiation of DNA synthesis. The extra licensed origins serve as backup and are activated only upon slowing or stalling of nearby replication forks, ensuring that DNA replication can be completed when cells encounter replication stress. [115] [116] Together, the excess of licensed origins and the tight cell cycle control of origin licensing and firing embody two important strategies to prevent under- and overreplication and to maintain the integrity of eukaryotic genomes. [2]

Early studies in S. cerevisiae indicated that replication origins in eukaryotes might be recognized in a DNA-sequence-specific manner analogously to those in prokaryotes. In budding yeast, the search for genetic replicators lead to the identification of autonomously replicating sequences (ARS) that support efficient DNA replication initiation of extrachromosomal DNA. [117] [118] [119] These ARS regions are approximately 100-200 bp long and exhibit a multipartite organization, containing A, B1, B2, and sometimes B3 elements that together are essential for origin function. [120] [121] The A element encompasses the conserved 11 bp ARS consensus sequence (ACS), [122] [123] which, in conjunction with the B1 element, constitutes the primary binding site for the heterohexameric origin recognition complex (ORC), the eukaryotic replication initiator. [124] [125] [126] [127] Within ORC, five subunits are predicated on conserved AAA+ ATPase and winged-helix folds and co-assemble into a pentameric ring that encircles DNA. [127] [128] [129] In budding yeast ORC, DNA binding elements in the ATPase and winged-helix domains, as well as adjacent basic patch regions in some of the ORC subunits, are positioned in the central pore of the ORC ring such that they aid the DNA-sequence-specific recognition of the ACS in an ATP-dependent manner. [127] [130] By contrast, the roles of the B2 and B3 elements are less clear. The B2 region is similar to the ACS in sequence and has been suggested to function as a second ORC binding site under certain conditions, or as a binding site for the replicative helicase core. [131] [132] [133] [134] [135] Conversely, the B3 element recruits the transcription factor Abf1, albeit B3 is not found at all budding yeast origins and Abf1 binding does not appear to be strictly essential for origin function. [2] [120] [136] [137]

Origin recognition in eukaryotes other than S. cerevisiae or its close relatives does not conform to the sequence-specific read-out of conserved origin DNA elements. Pursuits to isolate specific chromosomal replicator sequences more generally in eukaryotic species, either genetically or by genome-wide mapping of initiator binding or replication start sites, have failed to identify clear consensus sequences at origins. [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] Thus, sequence-specific DNA-initiator interactions in budding yeast signify a specialized mode for origin recognition in this system rather than an archetypal mode for origin specification across the eukaryotic domain. Nonetheless, DNA replication does initiate at discrete sites that are not randomly distributed across eukaryotic genomes, arguing that alternative means determine the chromosomal location of origins in these systems. These mechanisms involve a complex interplay between DNA accessibility, nucleotide sequence skew (both AT-richness and CpG islands have been linked to origins), Nucleosome positioning, epigenetic features, DNA topology and certain DNA structural features (e.g., G4 motifs), as well as regulatory proteins and transcriptional interference. [17] [18] [34] [35] [37] [150] [151] [143] [152] Importantly, origin properties vary not only between different origins in an organism and among species, but some can also change during development and cell differentiation. The chorion locus in Drosophila follicle cells constitutes a well-established example for spatial and developmental control of initiation events. This region undergoes DNA-replication-dependent gene amplification at a defined stage during oogenesis and relies on the timely and specific activation of chorion origins, which in turn is regulated by origin-specific cis-elements and several protein factors, including the Myb complex, E2F1, and E2F2. [153] [154] [155] [156] [157] This combinatorial specification and multifactorial regulation of metazoan origins has complicated the identification of unifying features that determine the location of replication start sites across eukaryotes more generally. [2]

To facilitate replication initiation and origin recognition, ORC assemblies from various species have evolved specialized auxiliary domains that are thought to aid initiator targeting to chromosomal origins or chromosomes in general. For example, the Orc4 subunit in S. pombe ORC contains several AT-hooks that preferentially bind AT-rich DNA, [158] while in metazoan ORC the TFIIB-like domain of Orc6 is thought to perform a similar function. [159] Metazoan Orc1 proteins also harbor a bromo-adjacent homology (BAH) domain that interacts with H4K20me2-nucleosomes. [109] Particularly in mammalian cells, H4K20 methylation has been reported to be required for efficient replication initiation, and the Orc1-BAH domain facilitates ORC association with chromosomes and Epstein-Barr virus origin-dependent replication. [160] [161] [162] [163] [164] Therefore, it is intriguing to speculate that both observations are mechanistically linked at least in a subset of metazoa, but this possibility needs to be further explored in future studies. In addition to the recognition of certain DNA or epigenetic features, ORC also associates directly or indirectly with several partner proteins that could aid initiator recruitment, including LRWD1, PHIP (or DCAF14), HMGA1a, among others. [33] [165] [166] [167] [168] [169] [170] [171] Interestingly, Drosophila ORC, like its budding yeast counterpart, bends DNA and negative supercoiling has been reported to enhance DNA binding of this complex, suggesting that DNA shape and malleability might influence the location of ORC binding sites across metazoan genomes. [31] [127] [172] [173] [174] A molecular understanding for how ORC's DNA binding regions might support the read out of structural properties of the DNA duplex in metazoans rather than of specific DNA sequences as in S. cerevisiae awaits high-resolution structural information of DNA-bound metazoan initiator assemblies. Likewise, whether and how different epigenetic factors contribute to initiator recruitment in metazoan systems is poorly defined and is an important question that needs to be addressed in more detail. [2]

Once recruited to origins, ORC and its co-factors Cdc6 and Cdt1 drive the deposition of the minichromosome maintenance 2-7 (Mcm2-7) complex onto DNA. [111] [175] Like the archaeal replicative helicase core, Mcm2-7 is loaded as a head-to-head double hexamer onto DNA to license origins. [112] [113] [114] In S-phase, Dbf4-dependent kinase (DDK) and Cyclin-dependent kinase (CDK) phosphorylate several Mcm2-7 subunits and additional initiation factors to promote the recruitment of the helicase co-activators Cdc45 and GINS, DNA melting, and ultimately bidirectional replisome assembly at a subset of the licensed origins. [28] [176] In both yeast and metazoans, origins are free or depleted of nucleosomes, a property that is crucial for Mcm2-7 loading, indicating that chromatin state at origins regulates not only initiator recruitment but also helicase loading. [144] [177] [178] [179] [180] [181] A permissive chromatin environment is further important for origin activation and has been implicated in regulating both origin efficiency and the timing of origin firing. Euchromatic origins typically contain active chromatin marks, replicate early, and are more efficient than late-replicating, heterochromatic origins, which conversely are characterized by repressive marks. [27] [179] [182] Not surprisingly, several chromatin remodelers and chromatin-modifying enzymes have been found to associate with origins and certain initiation factors, [183] [184] but how their activities impact different replication initiation events remains largely obscure. Remarkably, cis-acting “early replication control elements” (ECREs) have recently also been identified to help regulate replication timing and to influence 3D genome architecture in mammalian cells. [185] Understanding the molecular and biochemical mechanisms that orchestrate this complex interplay between 3D genome organization, local and higher-order chromatin structure, and replication initiation is an exciting topic for further studies. [2]

Why have metazoan replication origins diverged from the DNA sequence-specific recognition paradigm that determines replication start sites in prokaryotes and budding yeast? Observations that metazoan origins often co-localize with promoter regions in Drosophila and mammalian cells and that replication-transcription conflicts due to collisions of the underlying molecular machineries can lead to DNA damage suggest that proper coordination of transcription and replication is important for maintaining genome stability. [139] [141] [143] [146] [186] [20] [187] [188] Recent findings also point to a more direct role of transcription in influencing the location of origins, either by inhibiting Mcm2-7 loading or by repositioning of loaded Mcm2-7 on chromosomes. [189] [152] Sequence-independent (but not necessarily random) initiator binding to DNA additionally allows for flexibility in specifying helicase loading sites and, together with transcriptional interference and the variability in activation efficiencies of licensed origins, likely determines origin location and contributes to the co-regulation of DNA replication and transcriptional programs during development and cell fate transitions. Computational modeling of initiation events in S. pombe, as well as the identification of cell-type specific and developmentally-regulated origins in metazoans, are in agreement with this notion. [140] [148] [190] [191] [192] [193] [194] [152] However, a large degree of flexibility in origin choice also exists among different cells within a single population, [143] [149] [191] albeit the molecular mechanisms that lead to the heterogeneity in origin usage remain ill-defined. Mapping origins in single cells in metazoan systems and correlating these initiation events with single-cell gene expression and chromatin status will be important to elucidate whether origin choice is purely stochastic or controlled in a defined manner. [2]


Watch the video: Nick Lane: Origin of the eukaryotic cell (September 2022).


Comments:

  1. Conary

    You are not right. Write to me in PM, we will handle it.

  2. Alvah

    Interesting site, but you should add more information

  3. Sharisar

    The phrase he would have just by the way



Write a message