Why is the 3'UTR region highly methylated in most of the human genes?

Most of the human genes are found to be highly methylated in their 3'UTR region (0.8-0.9%). I was wondering if there is any specific reason for this?

According to Choi et al. Genome Biology 2009, 10:R89, DNA methylation at both coding boundaries may regulate transcription elongation and stabilize splicing by reducing the occurrences of exon skipping.

From the abstract:

Here we report a genome-wide observation of distinct peaks of nucleosomes and methylation at both ends of a protein coding unit. Elongating polymerases tend to pause near both coding ends immediately upstream of the epigenetic peaks, causing a significant reduction in elongation efficiency. Conserved features in underlying protein coding sequences seem to dictate their evolutionary conservation across multiple species. The nucleosomal and methylation marks are commonly associated with high sequence-encoded DNA-bending propensity but differentially with CpG density. As the gene grows longer, the epigenetic codes seem to be shifted from variable inner sequences toward boundary regions, rendering the peaks more prominent in higher organisms.

Their data (figures 1 and S2), however, do not support a generalized increase in the 3' UTR regions in either human T cells, mouse liver, yeast or flies.

Genome-wide functional analysis of human 5' untranslated region introns

Approximately 35% of human genes contain introns within the 5' untranslated region (UTR). Introns in 5'UTRs differ from those in coding regions and 3'UTRs with respect to nucleotide composition, length distribution and density. Despite their presumed impact on gene regulation, the evolution and possible functions of 5'UTR introns remain largely unexplored.


We performed a genome-scale computational analysis of 5'UTR introns in humans. We discovered that the most highly expressed genes tended to have short 5'UTR introns rather than having long 5'UTR introns or lacking 5'UTR introns entirely. Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories. In particular, genes with regulatory roles were surprisingly enriched in having 5'UTR introns. Finally, we analyzed the evolution of 5'UTR introns in non-receptor protein tyrosine kinases (NRTK), and identified a conserved DNA motif enriched within the 5'UTR introns of human NRTKs.


Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner. While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.

Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.

2. Detection and Quantification of Inosine

Since the discovery of inosine by means of laborious purifications of specific RNA species, followed by selective RNA degradation and chromatographic studies [1], a number of techniques now exist for mapping inosine modifications. All these strategies have their strengths and limitations, and their preferential use is dependent on the RNA species of interest and the biological/biochemical question that needs to be addressed. Molecular inosine can be readily detected and quantified using standard biochemical methods that mostly rely on conversion of inosine into hypoxanthine. Detection of inosine within RNA species, on the other hand, is more challenging and will be the focus of this section.

2.1. Chromatography-Based Methods

Chromatography is still used today to detect and quantify inosines. It is frequently used when working with in vitro-derived samples (e.g., synthetic or in vitro-transcribed RNAs bearing inosine modifications). The RNA of interest is usually radiolabeled, digested to single nucleotides, and resolved by thin-layer chromatography [14]. This is a semi-quantitative and cost-effective method but cannot be used in a high-throughput manner and does not give information on the location of the modified residue.

To study inosine modifications in in vivo-derived samples (e.g., inosine-containing RNAs derived from cellular extracts), liquid chromatography coupled with mass spectrometry (LC-MS/MS) can be used [15]. This is a highly quantitative non-radioactive method, but it is also low throughput, requires previous purification (in large amounts) of the RNA species of interest, does not give positional information about the modification, and necessitates expensive specialized equipment.

2.2. Reverse Transcription (RT)-Based Methods

Several methods for inosine detection and quantification are based on reverse transcription (RT) of RNAs and PCR amplification. Inosine is structurally a guanosine analogue ( Figure 1 A) that reverse transcriptases read as G instead of the A that it derives from. This artifact can be exploited to detect and quantify inosine by calculating the A-to-G mismatch proportion within PCR products (amplicons), while determining the position of the modifications. A simple, fast, semi-quantitative, and cost-effective method to characterize these amplicons is restriction fragment length polymorphism (RFLP), which can be used when the A-to-I(G) conversion creates or abolishes a restriction enzyme recognition site [16,17,18]. This method allows the evaluation of multiple samples at once but is low throughput in terms of the number of A-to-I edited sites that can be studied.

RT-PCR products can also be sequenced. This can be done by standard Sanger sequencing when only inosines at specific sites and on particular RNA species are evaluated [17,18], and it is a semi-quantitative and inexpensive approach. Most frequently, however, high-throughput RNA sequencing (RNA-seq) is used instead. This is a powerful and highly quantitative technique that allows the identification of multiple inosine sites in a given sample [19,20,21]. However, the method is expensive and requires a good knowledge of analytical computational tools.

Sequencing errors, or A-to-G genomic mutations, may lead to false-positive inosine assignments. To validate whether an A-to-G mutated site is indeed an A-to-I edited site, inosine chemical erasing (ICE)-Seq has been developed [22]. In this method, total RNA is treated with acrylonitrile prior to RNA-seq. This compound cyanoethylates inosines, and the resulting N1-cyanoethylinosines block RT. By comparing RNA-seq data obtained from the same sample with and without acrylonitrile treatments, inosine sites can be unequivocally detected. This method, however, cannot detect sites with 100% A-to-I editing or multiple inosine modifications located in close range.

2.3. Other Methods

Specific RNases can be used to cleave inosine-containing RNAs and resolve the digested RNA by gel electrophoresis. These methods are low throughput and not fully quantitative but are simple, inexpensive, and particularly useful when inosine cannot be readily detected by RT-based methods (e.g., certain tRNA species) [23].

For example, RNase T1 is an enzyme that cleaves both guanosine and inosine. It is possible to treat inosine-containing RNA with glyoxal/borate to protect guanosines (but not inosines) from cleavage by RNase T1. In this manner, only inosine-containing sites will be cleaved and can be readily detected [24,25]. Alternatively, endonuclease V (EndoV) specifically cleaves single-stranded RNA at inosine sites, generating fragments that can be detected by Northern blotting [26,27]. EndoV has also been used to develop splinted ligation-based inosine detection (SL-ID). In this method, RNA is treated with EndoV and the resulting (inosine-containing) cleavage products are captured by specific bridge oligonucleotides and splint-ligated to a radiolabeled ligation oligonucleotide, prior to the analysis of the reaction products by gel electrophoresis and autoradiography [23].

More recently, novel developments on Nanopore technologies are allowing the detection and quantification of inosine on native RNAs by high-throughput sequencing without the need of RT [28].

Regulation of mRNA stability

The turnover of mRNAs is another crucial step in post-transcriptional regulation of gene expression, as changes in mRNA abundance may alter the expression of specific genes by affecting the abundance of the corresponding protein. Several mechanisms have been proposed to describe how mRNA degradation takes place: decay can be preceded by shortening or removal of the poly(A) tail at the 3' end and/or by removal of the m7G cap at the 5' end [33]. The turnover of an mRNA is mostly regulated by cis-acting elements located in the 3' UTR, such as the AU-rich elements (AREs), which promote mRNA decay in response to a variety of specific intra- and extra-cellular signals. AREs have been experimentally grouped into three classes: class I and II AREs are characterized by the presence of multiple copies of the pentanucleotide AUUUA, which is absent from class III AREs [34]. Class I AREs control the cytoplasmic deadenylation of mRNAs by the degradation of all parts of the poly(A) tail at the same rate, generating intermediates with poly(A) tails of 30-60 nucleotides, which are then completely degraded. These elements are found mainly in mRNAs encoding nuclear transcription factors such as c-Fos and c-Myc (the products of 'fast response' genes) and also in mRNAs for some cytokines, such as interleukins 4 and 6. The presence of one or more copies of the pentanucleotide AUUUA next to a U-rich region is the structural characteristic of class I AREs. Class II AREs mediate asynchronous cytoplasmic deadenylylation, in other words the poly(A) tail is degraded at different rates in different transcripts, generating mRNAs without poly(A) tails. Among mRNAs containing this signal are those encoding the cytokines GM-CSF, interleukin 2, tumor necrosis factor α (TNF-α) and interferon-α. Class II AREs are characterized by tandem reiterations of the AUUUA pentamer, and an AU-rich region is usually found upstream of these repeats. The mRNAs containing class III AREs, such as those encoding c-Jun, do not contain the pentanucleotide AUUUA but have only a U-rich segment they show degradation kinetics similar to those of mRNAs containing class I AREs.

Degradation of mRNAs can also take place following endonuclease activity, in a mechanism independent of both deadenylation and decapping. Such a mechanism has been observed for the mRNA encoding the transferrin receptor, a protein that mediates iron transfer in the cell. The degradation pathway of this mRNA involves an endonucleolytic cleavage in the 3' UTR region that is mediated by the recognition of IRE structures and is regulated by the level of intracellular iron [35].

Upstream initiation codons and ORFs may also play a role in mRNA decay through the nonsense-mediated mRNA decay (NMD) pathway. The signal that triggers NMD is a nonsense codon followed by a splicing junction (the junction between two removed exons) [36] the presence of the splicing junction may be how normal stop codons are distinguished from premature termination codons. Indeed, normal stop codons and the 3' UTR are usually located in the last exon of the sequence and thus are not followed by a splicing junction. Exon junctions are recognized because a marker protein binds to the intron-containing transcript in the nucleus, remains bound to the exon junction after the splicing event has finished and is translocated to the cytoplasm with the processed mRNA [11]. The translation machinery usually displaces the marker protein, preventing the degradation of wild-type mRNAs. But if the ribosome encounters a stop codon that is either premature or due to the presence an upstream ORF, it disassembles and the marker proteins at the exon junction direct the aberrant mRNA towards NMD [37]. In Saccharomyces cerevisiae (which uses a downstream exonic element, DSE, as the second signal that triggers NMD), mRNAs containing functionally active upstream ORFs, like those encoding GCN4 or YAP1, are not degraded through the NMD pathway because they contain an mRNA-specific stabilizer sequence elements between the upstream ORF and the coding sequence that prevents the activation of the NMD pathway by interacting with the RNA-binding ubiquitin ligase Pub1 [38].

Upstream ORFs can also regulate mRNA stability through an NMD-independent mechanism. The 5' UTR of the S. cerevisiae gene YAP2 contains two upstream ORFs that inhibit ribosomal scanning and promote mRNA decay [26]. The destabilizing effect relies on the termination codon context, which modulates translation efficiency and mRNA stability. Table 5 reports some genes in which upstream ORFs have been demonstrated to affect gene expression.

Several studies have provided evidence that many hnRNPs not only function in the nucleus but also are involved in the control of mRNA fate in the cytoplasm [10] and can regulate translation, mRNA stability and cytoplasmic localization [37]. One example is the regulation of the amyloid precursor protein (APP) increasing the level of APP is an important contributing factor to the development of Alzheimer's disease. Stability of APP mRNA is dependent on a highly conserved 29-nucleotide element located in the 3' UTR that interacts with several cytoplasmic RNA-binding proteins [39]. Very interestingly, although some of these proteins are fragments of nucleolin (which is known to shuttle between the nucleus and cytoplasm), two proteins of 39 kDa and 38 kDa are subunits of hnRNP C, seen in this study for the first time in the cytoplasm [40].

Why is the 3'UTR region highly methylated in most of the human genes? - Biology

Primate-specific Alus constitute 11% of the human genome, with >1 million copies, and their genomic distribution is biased toward gene-rich regions.

The functions of Alus are highly associated with their sequence and structural features.

Alus can regulate gene expression by serving as cis elements.

Pol-III-transcribed free Alus mainly affect Pol II transcription and mRNA translation in trans.

Embedded Alus within Pol-II-transcribed mRNAs can impact their host gene expression through the regulation of alternative splicing, and RNA stability and translation.

Nearly half of annotated Alus are located in introns RNA pairing formed by orientation-opposite Alus across introns promotes circRNA biogenesis.

Alu elements belong to the primate-specific SINE family of retrotransposons and constitute almost 11% of the human genome. Alus are transcribed by RNA polymerase (Pol) III and are inserted back into the genome with the help of autonomous LINE retroelements. Since Alu elements are preferentially located near to or within gene-rich regions, they can affect gene expression by distinct mechanisms of action at both DNA and RNA levels. In this review we focus on recent advances of how Alu elements are pervasively involved in gene regulation. We discuss the impacts of Alu DNA sequences that are in close proximity to genes, Pol-III-transcribed free Alu RNAs, and Pol-II-transcribed Alu RNAs that are embedded within coding or noncoding RNA transcripts. The recent elucidation of Alu functions reveals previously underestimated roles of these selfish or junk DNA sequences in the human genome.

Dynamic DNA methylation patterns across the mouse and human IL10 genes during CD4 + T cell activation influence of IL-27

IL-10 plays a critical role in controlling inflammation and the anti-inflammatory functions of IL-10 are regulated based on its coordinated expression from various cellular sources, most notably T cells. Although nearly all CD4+ subpopulations can express IL-10, surprisingly little is known about the molecular mechanisms which control IL-10 induction, particularly in humans. To examine the regulation of human IL-10 expression, we created the hIL10BAC transgenic mouse. As previously reported, we observed conservation of myeloid-derived IL-10 expression but found that human IL-10 was only weakly expressed in splenic CD4+ T cells from hIL10BAC mice. Since DNA methylation is an important determinant of gene expression profiles, we assessed the patterns of DNA methylation in the human and mouse IL10 genes in naïve and activated CD4+ T cells. Across mouse and human IL10 there were no obvious patterns of CpG methylation in naïve CD4+ T cells following polyclonal activation. Overall however, the human IL10 gene had significantly higher levels of DNA methylation. Interestingly, coculture with the IL-10-inducing cytokine IL-27 lead to a site-specific reduction in methylation of the mouse but not human IL10 gene. Demethylation was specifically localized to an intronic site adjacent to a known regulatory region. Our findings indicate that while the mouse and human IL10 genes undergo variable changes in DNA methylation during CD4+ T cell activation, IL-27 appears to influence DNA methylation in a particular intronic region thus associating with IL-10 expression.


3 common polysaccharides:
1. glycogen (all glucose, animal storage, alpha-1,4 for linear, alpha-1,6 for branching)
2. starch (all glucose, plant storage, alpha-1,4)
3. cellulose (all glucose, plant structure, beta-1,4)

hydrophobic, since C-C and C-H are nonpolar

all double bonds are cis (z)

only even numbered fatty acids

found concentrated in lipid rafts

increases fluidity at low temperatures
decreases fluidity at high temperatures

5-3 synthesis (refers to carbon number on which bonds form)

nucleotides connected by phosphodiester bonds- OH on bottom of sugar (5 end base) connects to phosphate (3 end base) forming an P-O bond

A and T have 2 H bonds
C and G have 3 H bonds

melting temperature- temp at which 1/2 of H bonds can break, eventually will denature

CG bonds are more stable than AT bonds

more bonds and longer strands means higher melting temperature

1. methylation- protection from restriction enzymes (they chop up viral DNA which is not methylated)
2. supercoiling- in eukaryotes too, most DNA is negatively supercoiled, loops are looped/unlooped using gyrase and topoisomerases

eukaryotes- several linear chromosomes

euchromatin- open chromatin, increased expression, genes that are continuously expressed are found here

contain repetitive sequences

other amino acids exist, but no codons for them

transition mutation- A <-> G, C <-> T
transversion mutation switched purine/pyrimidine

endogenous damage:
1. reactive oxygen species- oxidize DNA
2. crosslinked bases
3. physical damage

exogenous damage:
1. UV radiation-pyrimidine dimers
2. X rays- double stranded breaks, translocations
3. chemicals- intercalation

inverted repeats- points of recognition for transposon

types of transposons:
1. IS element (codes for transposase)
2. complex transposon (carries additional genes)
3. composite transposon (flanked by 2 IS elements)

two transposons:
1. in same direction- DNA folds, deletes the middle gene
2. in opposite directions- DNA folds, inverts middle gene, shouldn't cause any problems if regulatory genes flipped

mismatch repair- after DNA replication, the side with more methyl groups must be the original sequence, so replace sequence on the other side

excision repair- before DNA replication, replace a single base

homologous end joining- after DNA replication, repair double-stranded breaks using sister chromatid as template, homologous crossover

non-homologous end joining- before replication, repair double-stranded breaks, loses some of the sequence

if non-homologous end joining messes up, translocations can occur causing gene fusion

DNA is transcribed/read 3 -> 5
mRNA is synthesized 5 -> 3

semi-conservative, leading strand and lagging strand pass on to separate DNA strands

prevent damage to ends of linear chromosomes

3 types of RNA:
1. rRNA- ribosomal, skeleton of ribosomes
2. mRNA- messenger, feeds into translation
3. tRNA- transfer, carries AA to ribosomes

template strand- anti-sense strand, RNA polymerase attaches
coding strand- sense strand, same code as the resulting mRNA, not transcribed

RNA polymerase attaches to promoter on template strand, which is next to operator and START site

different genes do not have to be close to each other or on the same chromosome to be regulated together

environment determines gene expression, ex: addition of new sugar will induce expression of new enzymes to digest that sugar

Jacob-Monod model- describes lac operon

promoter (binds activator) -> operator (binds repressor) -> lac genes, trigger metabolism of lactose

lactose absent- repressor binds
lactose present- repressor leaves

glucose absent- activator binds
glucose present- activator leaves

spliceosome- contains snRNPs, which contain snRNA, machinery that splices

eukaryotes and prokaryotes have different ribosomes

ribosomes are synthesized in the nucleolus in eukaryotes

prokaryotes- large subunit (505), small subunit (305)
eukaryotes- large subunit (605), small subunit (405)

1. peptide bond formed between A and P amino acids
2. P amino acid then leaves P tRNA
3. mRNA translocates to move A amino acid to P site

STOP codon- no tRNA, instead binds release factor so final amino acid leaves its tRNA

5/3'-UTR- untranslated regions at each end of mRNA, assist regulation of translation

covalent modification- phosphoylation, disulfide bridges

variable portion- both tips on top of Y shape, recognizes antigen
constant portion- bottom of Y shape, differs between classes/isotypes

made of protein (capsid is usually icosahedral or helical and tail fibers) and nucleic acids (DNA or RNA, ss or ds)

general life cycle:
1. attachment- receptor specificity
2. injection- deliver genome to host cell

1. ssRNA is template for mRNA
2. RNA dependent RNA polymerase creates complementary template to replicate more of original ssRNA
3. template is translated by host cell ribosomes to produce capsids and viral enzymes

(+)RNA lysogenic (retrovirus like HIV):
1. ssRNA is the same as the mRNA
2. RNA dependent DNA polymerase (reverse transcriptase) converts ssRNA to ssDNA
3. host DNA polymerase converts ssDNA to dsDNA
4. dsDNA inserts into host cell genome, provirus
5. transcribed to replicate original ssRNA and viral enzymes

bad prions are produced:
1. spontaneous mutation (mad cow disease)
2. gene can be inherited (fatal familial insomnia)
3. ingestion of diseased tissue (kuru)

Hepatitis D is a viroid that must coinfect with Hep B, which is caused by a regular virus

cell membranes:
1. gram-positive bacteria- cell wall made of peptidoglycan surrounds cell membrane, stains purple
2. gram-negative bacteria- peptidoglycan cell wall surrounded by inner and outer cell membranes, stains pink, does conjugation
3. flagella- basal structure attached to cell wall/membrane, with a hook that whips around filament
4. no cholesterol

outmost surface has specific proteins that trigger a unique immune response, some foreign cells can vary the proteins presented on the surface to bypass the immune system

energy source: photo is light, chemo is ATP
carbon source: auto is CO2, hetero is other organisms

1. photoautotrophs- plants
2. chemoheterotrophs- animals
3. photoheterotrophs- carnivorous plants
4. chemoautotrophs- archaebacteria

auxotrophs- cannot synthesize a key molecule to live:
1. arg(-)- can't make arginine
2. leu(-)- can't make leucine
3. lac(-)- can't use lactose

conjugation is a feature of gram-negative bacteria

F factor- circular DNA element that encodes fertility gene, F+ is male, F- is female

conjugation bridge- after male cell produces sex pili and contacts female cell, bridge forms

F factor is replicated in F+ cell and transferred to F- cell

high frequency of recombination (Hfr cell)- F factor integrated into genome, so conjugation can transfer other others of bacterial genome

if nuclear/mitochondrial/peroxisomal protein, finish translation in cytosol

if secreted/transmembrane/lysosomal protein (like an antibody), finish translation in rough ER

if transmembrane protein, signal sequence will remain as transmembrane region, otherwise it acts as anchor that is removed later

if nuclear protein, nuclear localization signal

if ER protein, retrograde transport brings it back to ER from Golgi

four colligative properties:
1. freezing point depression occurs with more ions and higher molality
2. boiling point elevation occurs with more ions and higher molality
3. vapor pressure depression occurs with increased solute
4. osmotic pressure elevation occurs with increased solute, temperature, molarity

bp elevation and fp depression are analogous to cholesterol's temperature dependent role in the plasma membrane

osmosis- movement of water down its concentration gradient

osmotic pressure- water moves to high concentration of solute particles, water would move to 1 M NaCl over 1M sucrose

count the number of ions, don't just look at the concentration!

simple diffusion:
1. directly cross membrane
2. works well for small hydrophobic molecules like CO2, O2
3. also works for larger planar and hydrophobic molecules

facilitated diffusion- uses helping protein, works well for small hydrophilic molecules like glucose

primary- directly use ATP
Na+/K+ ATPase:
1. pumps 3 Na+ out and 2 K+ in, uses 1 ATP
2. maintain osmotic balance
3. establish electrical gradient
4. set up sodium gradient for secondary active transport

secondary- use ATP to establish electrochemical gradient, use gradient to drive transport
Na+/glucose symporter:
1. glucose and Na transported in from lumen
2. powered by Na gradient, created by Na/K ATPase

activates adenylyl cyclase

process ATP into cAMP to amplify effect

cAMP is a secondary messenger, activates cAMP-dependent protein kinases like protein kinase A, which phosphorylate enzymes to activate them

1. actin, branching from centrosome
2. two actin fibers twist together to form SMALL tubes
3. muscle contraction (myosin, troponin, Ca2+), cytokinesis, adherent and tight junctions
4. some cell mobility
5. myosin motor protein

intermediate filaments:
1. different proteins
2. MEDIUM tubes
3. structure

desmosomes- intermediate filaments

tight junctions- actin, seal lumen to separate environments (epithelial barriers like blood/lumen in gut)

G1/S checkpoint- tightly regulated, take inventory of nucleotides, enzymes, nutrients for DNA replication, sent to G0 senescence if doesn't pass

G2/M checkpoint- ensure DNA replication complete, check for mutations

cells stuck in at a checkpoint will just continue growing larger

1. spindle fibers attach to centromeres at kinetochores
2. align chromosomes at plate

1. sister chromatids separate
2. cytokinesis starts

1. decondense DNA
2. reform nucleus
3. break down spindle
4. finish cytokinesis

1. actin helps split cell at cleavage furrow

2. tumor suppressor genes- code for proteins that slow down cell cycle, repair DNA, and trigger apoptosis, p53 is guardian angel

extracellular death signals- surround cell apoptosis
intracellular death signal- tumor suppressor, virus

apoptosis- triggered by internal factors, causes cell shrinkage, doesn't affect other cells

prophase I:
1. homologous chromosomes pair up to from tetrads, connected by synaptonemal complex
2. recombination/crossing-over occurs between homologous pairs
3. chromosomes condense, nuclear envelope breaks down, spindles form
4. longest phase!

metaphase I:
1. tetrads align along metaphase plate

anaphase I:
1. homologous pairs separate, sister chromatids remain together
2. begin cytokinesis

telophase I:
1. chromosomes decondense, nuclear envelope forms, spindles breakdown
2. cytokinesis ends
3. now considered haploid, since each cell has single set

meiosis II:
1. identical to mitosis, but with haploid cells
2. oocyte/spermatocyte is formed, haploid

can form 2^n different gametes, n = haploid number or how many chromosomes


The mutation rate is an important determinant of evolutionary dynamics. Because the mutation rate determines the rate of appearance of beneficial and deleterious mutations, it is subject to second-order selection. The mutation rate varies between and within species and populations, is increased under stress, and is genetically controlled by mutator alleles. The mutation rate may also vary among genetically identical individuals: for example, empirical evidence from bacteria suggests that the mutation rate may be affected by translation errors and expression noise in various enzymes and proteins. Importantly, this variance may be heritable via transgenerational epigenetic inheritance. Here we investigate how the inheritance mode of mutation rates affects the rate of adaptive evolution. We model an asexual population with two mutation rate phenotypes, non-mutator and mutator. An offspring may switch from its parental phenotype to the other phenotype. The rate of switching between the phenotypes was allowed to span a range of values such that the mutation rate can be interpreted as a genetically inherited trait when the switching rate is low, as an epigenetically inherited trait when the switching rate is intermediate, or as a random trait when the switching rate is high. We find that epigenetic inheritance of the mutation rate results in the fastest rates of adaptation on artificial and empirical fitness landscapes for most biologically realistic parameter sets. Populations with an intermediate switching rate are able to maintain the coupling of a mutator phenotype and pre existing genetic mutations, helpful in crossing fitness valleys. Further, epigenetic inheritance allows the population to quickly revert to low mutation rates once adaptation is achieved, avoiding the accumulation of deleterious mutations associated with mutators. Our results provide a rationale for the evolution of epigenetic inheritance of mutation rate, suggesting that it could have been selected to facilitate adaptive evolution.

Different subsets of the tRNA pool in human cells are expressed in different cellular conditions. The `proliferation-tRNAs' are induced upon normal and cancerous cell division, while the `differentiation-tRNAs' are active in non-dividing, differentiated cells. Here we examine the essentiality of the various tRNAs upon cellular growth and arrest. We established a CRISPR-based editing procedure with sgRNAs that each target a tRNA family. We measured tRNA essentiality for cellular growth and found that most proliferation-tRNAs are essential compared to differentiation-tRNAs in rapidly growing cell lines. Yet in more slowly dividing lines, the differentiation-tRNAs were more essential. In addition, we measured the essentiality of each tRNA family upon response to cell cycle arresting signals. Here we detected a more complex behavior with both proliferation-tRNAs and differentiation tRNAs showing various levels of essentiality. These results provide the so-far most comprehensive functional characterization of human tRNAs with intricate roles in various cellular states.

Tracing evolutionary processes that lead to fixation of genomic variation in wild bacterial populations is a prime challenge in molecular evolution. In particular, the relative contribution of horizontal gene transfer (HGT) vs. de novo mutations during adaptation to a new environment is poorly understood. To gain a better understanding of the dynamics of HGT and its effect on adaptation, we subjected several populations of competent Bacillus subtilis to a serial dilution evolution on a high-salt-containing medium, either with or without foreign DNA from diverse pre-adapted or naturally salt tolerant species. Following 504 generations of evolution, all populations improved growth yield on the medium. Sequencing of evolved populations revealed extensive acquisition of foreign DNA from close Bacillus donors but not from more remote donors. HGT occurred in bursts, whereby a single bacterial cell appears to have acquired dozens of fragments at once. In the largest burst, close to 2% of the genome has been replaced by HGT. Acquired segments tend to be clustered in integration hotspots. Other than HGT, genomes also acquired spontaneous mutations. Many of these mutations occurred within, and seem to alter, the sequence of flagellar proteins. Finally, we show that, while some HGT fragments could be neutral, others are adaptive and accelerate evolution.

Programmed ribosomal frameshifting (PRF) is the controlled slippage of the translating ribosome to an alternative frame. This process is widely employed by human viruses such as HIV and SARS coronavirus and is critical for their replication. Here, we developed a high-throughput approach to assess the frameshifting potential of a sequence. We designed and tested >12,000 sequences based on 15 viral and human PRF events, allowing us to systematically dissect the rules governing ribosomal frameshifting and discover novel regulatory inputs based on amino acid properties and tRNA availability. We assessed the natural variation in HIV gag-pol frameshifting rates by testing >500 clinical isolates and identified subtype-specific differences and associations between viral load in patients and the optimality of PRF rates. We devised computational models that accurately predict frameshifting potential and frameshifting rates, including subtle differences between HIV isolates. This approach can contribute to the development of antiviral agents targeting PRF.

Most human genes are alternatively spliced, allowing for a large expansion of the proteome. The multitude of regulatory inputs to splicing limits the potential to infer general principles from investigating native sequences. Here, we create a rationally designed library of >32,000 splicing events to dissect the complexity of splicing regulation through systematic sequence alterations. Measuring RNA and protein splice isoforms allows us to investigate both cause and effect of splicing decisions, quantify diverse regulatory inputs and accurately predict (R-2 = 0.73-0.85) isoform ratios from sequence and secondary structure. By profiling individual cells, we measure the cell-to-cell variability of splicing decisions and show that it can be encoded in the DNA and influenced by regulatory inputs, opening the door for a novel, single-cell perspective on splicing regulation.

The translation machinery and the genes it decodes co-evolved to achieve production throughput and accuracy. Nonetheless, translation errors are frequent, and they affect physiology and protein evolution. Mapping translation errors in proteomes and understanding their causes is hindered by lack of a proteome-wide experimental methodology. We present the first methodology for systematic detection and quantification of errors in entire proteomes. Following proteome mass spectrometry, we identify, in E. coli and yeast, peptides whose mass indicates specific amino acid substitutions. Most substitutions result from codon-anticodon mispairing. Errors occur at sites that evolve rapidly and that minimally affect energetic stability, indicating selection for high translation fidelity. Ribosome density data show that errors occur at sites where ribosome velocity is higher, demonstrating a trade-off between speed and accuracy. Treating bacteria with an aminoglycoside antibiotic or deprivation of specific amino acids resulted in particular patterns of errors. These results reveal a mechanistic and evolutionary basis for translation fidelity.

Splicing expands, reshapes, and regulates the transcriptome of eukaryotic organisms. Despite its importance, key questions remain unanswered, including the following: Can splicing evolve when organisms adapt to new challenges? How does evolution optimize inefficiency of introns' splicing and of the splicing machinery? To explore these questions, we evolved yeast cells that were engineered to contain an inefficiently spliced intron inside a gene whose protein product was under selection for an increased expression level. We identified a combination of mutations in Cis (within the gene of interest) and in Trans (in mRNA-maturation machinery). Surprisingly, the mutations in Cis resided outside of known intronic functional sites and improved the intron's splicing efficiency potentially by easing tight mRNA structures. One of these mutations hampered a protein's domain that was not under selection, demonstrating the evolutionary flexibility of multi-domain proteins as one domain functionality was improved at the expense of the other domain. The Trans adaptations resided in two proteins, Npl3 and Gbp2, that bind pre-mRNAs and are central to their maturation. Interestingly, these mutations either increased or decreased the affinity of these proteins to mRNA, presumably allowing faster spliceosome recruitment or increased time before degradation of the pre-mRNAs, respectively. Altogether, our work reveals various mechanistic pathways toward optimizations of intron splicing to ultimately adapt gene expression patterns to novel demands.

The localization of mRNAs encoding secreted/membrane proteins (mSMPs) to the endoplasmic reticulum (ER) likely facilitates the co-translational translocation of secreted proteins. However, studies have shown that mSMP recruitment to the ER in eukaryotes can occur in a manner that is independent of the ribosome, translational control, and the signal recognition particle, although the mechanism remains largely unknown. Here, we identify a cis-acting RNA sequence motif that enhances mSMP localization to the ER and appears to increase mRNA stability, and both the synthesis and secretion of secretome proteins. Termed SECReTE, for secretion-enhancing cis regulatory targeting element, this motif is enriched in mRNAs encoding secretome proteins translated on the ER in eukaryotes and on the inner membrane of prokaryotes. SECReTE consists of >= 10 nucleotide triplet repeats enriched with pyrimidine (C/U) every third base (i.e. NNY, where N = any nucleotide, Y = pyrimidine) and can be present in the untranslated as well as the coding regions of the mRNA. Synonymous mutations that elevate the SECReTE count in a given mRNA (e.g. SUC2, HSP150, and CCW12) lead to an increase in protein secretion in yeast, while a reduction in count led to less secretion and physiological defects. Moreover, the addition of SECReTE to the 3'UTR of an mRNA for an exogenously expressed protein (e.g. GFP) led to its increased secretion from yeast cells. Thus, SECReTE constitutes a novel RNA motif that facilitates ER-localized mRNA translation and protein secretion.

Technological breakthroughs in the past two decades have ushered in a new era of biomedical research, turning it into an information-rich and technology-driven science. This scientific revolution, though evident to the research community, remains opaque to nonacademic audiences. Such knowledge gaps are likely to persist without revised strategies for science education and public outreach. To address this challenge, we developed a unique outreach program to actively engage over 100 high-school students in the investigation of multidrug-resistant bacteria. Our program uses robotic automation and interactive web-based tools to bridge geographical distances, scale up the number of participants, and reduce overall cost. Students and teachers demonstrated high engagement and interest throughout the project and valued its unique approach. This educational model can be leveraged to advance the massive open online courses movement that is already transforming science education.

In experimental evolution, scientists evolve organisms in the lab, typically by challenging them to new environmental conditions. How best to evolve a desired trait? Should the challenge be applied abruptly, gradually, periodically, sporadically? Should one apply chemical mutagenesis, and do strains with high innate mutation rate evolve faster? What are ideal population sizes of evolving populations? There are endless strategies, beyond those that can be exposed by individual labs. We therefore arranged a community challenge, Evolthon, in which students and scientists from different labs were asked to evolve Escherichia coli or Saccharomyces cerevisiae for an abiotic stresslow temperature. About 30 participants from around the world explored diverse environmental and genetic regimes of evolution. After a period of evolution in each lab, all strains of each species were competed with one another. In yeast, the most successful strategies were those that used mating, underscoring the importance of sex in evolution. In bacteria, the fittest strain used a strategy based on exploration of different mutation rates. Different strategies displayed variable levels of performance and stability across additional challenges and conditions. This study therefore uncovers principles of effective experimental evolutionary regimens and might prove useful also for biotechnological developments of new strains and for understanding natural strategies in evolutionary arms races between species. Evolthon constitutes a model for community-based scientific exploration that encourages creativity and cooperation.

The epigenetic dynamics of induced pluripotent stem cell (iPSC) reprogramming in correctly reprogrammed cells at high resolution and throughout the entire process remain largely undefined. Here, we characterize conversion of mouse fibroblasts into iPSCs using Gatad2a-Mbd3/NuRD-depleted and highly efficient reprogramming systems. Unbiased high-resolution profiling of dynamic changes in levels of gene expression, chromatin engagement, DNA accessibility, and DNA methylation were obtained. We identified two distinct and synergistic transcriptional modules that dominate successful reprogramming, which are associated with cell identity and biosynthetic genes. The pluripotency module is governed by dynamic alterations in epigenetic modifications to promoters and binding by Oct4, Sox2, and Klf4, but not Myc. Early DNA demethylation at certain enhancers prospectively marks cells fated to reprogram. Myc activity drives expression of the essential biosynthetic module and is associated with optimized changes in tRNA codon usage. Our functional validations highlight interweaved epigenetic- and Myc-governed essential reconfigurations that rapidly commission and propel deterministic reprogramming toward naive pluripotency.

Additional data files

The following additional data are available with the online version of this paper: a figure showing nucleosome patterns surrounding the TSS, start codon, and stop codon in yeast and fly (Additional data file 1) a figure showing illustrative genes with nucleosomal peaks at coding boundaries (Additional data file 2) a figure showing DNA methylation level surrounding the transcript and coding region boundaries in the mouse liver (Additional data file 3) a figure showing nucleosome occupancy according to differential Pol II elongation efficiency (Additional data file 4) a figure comparing densities of Ser5-phosphorylated and unphosphorylated Pol II (Additional data file 5) a figure showing Pol II density with higher and lower nucleosome occupancy (Additional data file 6) a figure showing DNA bending propensity at the start and stop codons (Additional data file 7) a figure demonstrating the length of the 5' UTR and gene expression level for genes with high CpG density around the start codon (Additional data file 8) a figure showing the overall patterns of nucleosome occupancy and DNA methylation level inside the protein coding region (Additional data file 9).

Watch the video: Untranslated regions: how 5 and 3 UTRs regulate transcription and translation (January 2022).