How would one determine if an up regulation of one protein leads to an over expression of another?

Again, I'm new to biology and have a bunch of questions. Does it depend on the proteins involved? OR are there basic co expression procedures one could implement to determine exactly how much one protein upregulation leads to overexpression of another protein ?

To demonstrate a causal relationship where protein A leads to increase in protein B, you need to experimentally increase the level of A and measure if B increases. Over-expression is typically done with overexpression of protein A from a plasmid containing the coding sequence, driven by a strong promoter that causes high levels of protein synthesis. See this Wikipedia page for some pointers. Relative protein levels can be quantified by western blotting. The experimental details depend a lot on what cell type and protein you are studying, of course.

Chronic systemic exposure to IL6 leads to deregulation of glycolysis and fat accumulation in the zebrafish liver

Inflammation is a constant in Non-Alcoholic Fatty Liver Disease (NAFLD), although their relationship is unclear. In a transgenic zebrafish system with chronic systemic overexpression of human IL6 (IL6-OE) we show that inflammation can cause intra-hepatic accumulation of triglycerides. Transcriptomics and proteomics analysis of the IL6-OE liver revealed a deregulation of glycolysis/gluconeogenesis pathway, especially a striking down regulation of the glycolytic enzyme aldolase b. Metabolomics analysis by mass spectrometry showed accumulation of hexose monophosphates and their derivatives, which can act as precursors for triglyceride synthesis. Our results suggest that IL6-driven repression of glycolysis/gluconeogenesis, specifically aldolase b, may be a novel mechanism for fatty liver. This mechanism may be relevant for NAFLD in lean individuals, an emerging class of NAFLD prevalent more in Asian Indian populations.


Aerobic organisms employ critical control strategies to ensure proper oxygen supply through various physiological and metabolic cellular signaling networks. The inability to meet cellular oxygen demands, termed hypoxia, results in the activation of specific cellular stress responses [1, 2]. Hypoxic stress induces global gene expression changes in order to help cells adapt and survive by altering the cell’s metabolic and angiogenic pathways and restoring oxygen homeostasis [3,4,5,6,7,8,9,10]. If these repair and adaptive mechanisms fail, cells modify their gene expression profiles to induce programmed cell death [11,12,13,14,15,16]. Although active hypoxia signaling networks are necessary during embryogenesis and development [17,18,19], hypoxic conditions either diminish normally, or they contribute to pathological events in mature organisms [20,21,22,23].

Efficient activation of hypoxia signaling and angiogenesis is critical, for example, after stroke [24], myocardial infarction [25], and other ischemic events [26,27,28,29]. Alternatively, metabolic adaptation to low oxygen levels and the related tissue revascularization allows for the survival and progression of the majority of human tumors [30,31,32], and contributes to macular degeneration [33,34,35,36], glaucoma progression [37], and diabetic retinopathy [38,39,40,41]. Thus, the discovery and development of therapeutic strategies exploiting hypoxia-related cellular networks are of great interest to modern medicine, as evidenced by the awarding of the 2019 Nobel Prize in Physiology or Medicine to Drs. Semenza, Ratcliffe, Kaelin on their research into how cells detect oxygen and react to hypoxia [42,43,44,45,46].

The main goal of the cellular response to hypoxia is to promote cell survival and restore oxygen homeostasis. This goal, however, is accompanied by deregulation of cellular organelle changes in mitochondria and endoplasmic reticulum (ER) function that are reflected in perturbations in protein folding and trafficking [4, 47,48,49,50,51,52,53]. Erratic protein folding activates another specific stress response pathway, the unfolded protein response (UPR). The UPR promotes cellular survival by restoring endoplasmic and mitochondrial homeostasis through its distinct signaling networks [54,55,56], but if unsuccessful, the UPR induces cell death [57,58,59].

Although activation of the UPR supports surviving hypoxia, it can also impair cellular survival [60]. The ER, for example, is responsible for folding and maturation of transmembrane and secretory proteins [61,62,63,64,65,66,67,68,69] that include proangiogenic receptors and ligands such as vascular endothelial growth factor (VEGF) and erythropoietin (EPO) that are critical for hypoxia-induced angiogenesis and erythropoiesis, respectively [70,71,72]. Thus, although underappreciated, understanding mutual crosstalk between these stress response pathways is important for understanding and developing therapeutic interventions in cardiovascular diseases and cancer. Nevertheless, despite the extensive studies on both of these stress responses, the resulting consequences of their collective activation remain largely unexplained and are mainly limited to in vitro cell culture-based models. In this review, we summarize these two cell survival pathways and the implications of UPR involvement in the hypoxia cellular response pathway.


The Hsf4 -/- mouse exhibits aberrant fiber development in the pericentric region of the lens, partially mimicking human HSF4 mutation cataracts

In order to examine the function of HSF4 in lens formation in detail, we generated targeted disruption of the mouse Hsf4 gene by homologous recombination of 129S3 embryonic stem (ES) cells. In the vector, exons 3–5, which encode the DNA-binding domain, were replaced with the neomycin-resistance gene followed by the PGK cassette (Figure 1A). The ES cells (129S3 strain, derived from agouti) were electroporated with the linearized targeting vector under positive-negative selection [10], and eight correctly targeting clones were obtained (data not shown). ES cells with correctly targeting clones were injected into a C57Bl/6 blastocyst (black) the genotypes of the offspring were analyzed by PCR to identify wild-type (+/+), heterozygous (+/-), and homozygous (-/-) varieties (Figure 1B). As expected, the ratio of phenotypes was in accordance with Mendelian frequency. Hsf4 -/- mice are largely normal across all developmental stages except for cataract formation. Under slit-lamp detection, we observed a cataract phenotype in the Hsf4 knockout mouse (Figure 1C). Opacity of the lens of the Hsf4 -/- mouse appeared at an early postnatal stage and increased with age.

Hsf4 knockout results in lens abnormalities and developmental defects, notably swollen and loose fiber structure. (A) The wild-type Hsf4 locus, the targeting vector, and the allele following homologous recombination are shown. The targeting vector was used to replace exons 3–5, encoding the DNA-binding domain, with a neomycin resistance gene. (B) Analysis of genomic DNA, cDNA, and protein of the Hsf4 gene from Hsf4 knockout mouse by PCR, RT-RCR, and Western blotting analysis. PCR primers flanking the targeted exons amplified 2.6-kb and 1.6-kb bands in Hsf4-/- and Hsf4+/+ mice, respectively. Loss of wild-type expression of the Hsf4 gene in Hsf4-/- mice was confirmed by the lack of a PCR product after reverse transcription-PCR of total RNA with one of primers within the targeted region. Western blot analysis of lens extracts of 8-week-old mice using a specific antibody against mouse Hsf4 protein showed that Hsf4 protein was absent in Hsf4-/- mice. (C) Slit-lamp images of mouse lens and histological examination of lens nuclear region sections of 8-week-old mice. Arrows indicate normal and loose fibers in lens nuclear regions of Hsf4+/+, Hsf4+/-, and Hsf4-/- mice. Bar, 50 um. (D) SEM images showing the loose and abnormal fibers of Hsf4 knockout mice compared with the normal structures of wild-type mice. Bar, 500 um above and 5 um below.

The structure of the Hsf4 -/- lens fibers became loose, and a vacuole-like cavity that appeared in lens fiber cells at day E15.5 (data not shown) became severe compared to wild-type undegraded nuclei were clearly seen under a light microscope. However, the bow region of the lens, where the lens epithelial cells differentiate into fibers, was generally normal (Figure 1C). SEM images revealed loose fibers in the lenses of Hsf4 knockout mice that showed much less interaction than those of their wild-type counterparts (Figure 1D).

HSF4 has a crucial role in the expression of γ-crystallins, notably γS-crystallin, during postnatal maturation of the lens

The Hsf4 -/- mouse lens is fragile and much lighter, but of a similar size, compared with its wild-type counterpart (Figure 2A). More than 90% of lens proteins are in the soluble form and include a variety of crystallins in mammals, the crystallins are αA and αB βB1, βB2, βB3, βA3/A1, and βA4 γA, γB, γC, γD, γE, γF, and γS. In mice, the γ-crystallins are the major contributor to the weight of the mature lens proteins. In humans, β-crystallins and γ-crystallins are in almost equal proportions, with a slightly lower amount of α-crystallins. Fujimoto et al. detected markedly reduced expression of γ(A-F)-crystallin genes in adult HSF4-null mice, even at 2 days old, but the expression levels of γ(A-F)-crystallin genes were normal in the lens of E15.5 HSF4-null embryos. They used chromatin IP (ChIP) analysis to show that HSF4 binds upstream of the γF-crystallin gene, suggesting that HSF4 regulates expression of the γF-crystallin genes [8]. In contrast, Min et al. did not observe a significant reduction of γ-crystallin genes in 1-day-old to 28-day-old Hsf4-null mice, except the γF-crystallin gene, which showed reduced expression in 10-day-old Hsf4-null mice compared to wildtype mice [9]. These conflicting results may be due to differences in Hsf4-null mice construction, differences in the genetic background of the mice being tested, and other unknown factors. However, neither Min et al. nor Fujimoto et al. studied another important crystallin protein, γS-crystallin. During maturation of the mouse lens, the level of γS-crystallin increases fivefold, replacing other types of crystallins and accounting for up to 15% of the total weight [11]. This finding prompted us to examine the expression of γ-crystallins (especially γS-crystallin) in the lens of knockout and wild-type mice. We quantified the mRNA expression of γ-crystallins and assayed the transcriptional activity of Hsf4. We found that Hsf4 -/- mice have reduced expression levels of all subtypes of γ-crystallins directly after birth compared with wild-type mice, and there was almost a complete lack of γ-crystallin expression at 8 weeks old. In wild-type mice, the level of γS-crystallin mRNA indicates that γS-crystallin is only a small proportion of total γ-crystallin at birth, and the level increases to make it the major γ-crystallin at 8 weeks old (Figure 2B). The HSF4 and γS-crystallin genes are first expressed at the late embryonic stages and are up-regulated in late embryonic development [12, 13]. The γ(A-F)crystallin genes are located in a gene cluster and are regulated by Sox1 and Maf, but the regulation of γS-crystallin gene (Crygs) has not been well characterized [14–17]. To further evaluate the role of HSF4 in the regulation of γS-crystallin expression, we analyzed the expression of γS-crystallin mRNA in the human lens epithelial cell line SRA01/04 under conditions of hHSF4b overexpression [18]. The γS-crystallin mRNA expression level was tenfold higher in hHSF4b-overexpressing cells than in the control (Figure 2C). As the γS-crystallin promoter region contains a heat-shock element (Crygs-HSE) that is less conserved than in the promoters of γ(A-F)-crystallin (Figure 2D), we asked whether HSF4b can interact directly with the endogenous promoter of CRYGS to regulate its expression. We used ChIP assays to test this hypothesis. The immunoprecipitation of solubilized chromatin prepared from HSF4b-expressing SRA01/04 cells with an anti-HSF4b antibody was followed by PCR using primers that target the potential HSF4b-binding site in the CRYGS promoter region. PCR yielded the expected band in the cells expressing HSF4b, and no band for the normal goat IgG control (Figure 2E). Additionally, we assumed that HSF4b would bind CRYGS-HSE to activate γS-crystallin transcription. Under this assumption, the CRYGS-luc vector, which contains an upstream sequence of the γS-crystallin gene, was constructed, and luciferase activity levels with and without overexpression of HSF4b were determined in the SRA01/04 cell line. The relative luciferase activity with overexpression of HSF4b was about sixfold greater than that of the system without overexpression. The dominant-negative HSF4b, which had lost the DNA-binding domain of HSF4b, had no significant effect on luciferase activity (Figure 2F). These results indicate that HSF4b could regulate the expression of γS-crystallin by binding to Crygs-HSE.

γ-crystallin expression during lens development. (A) Lens weight in 8-week-old wild-type and Hsf4 knockout mice. Ten wild-type lenses and 16 knockout lenses were used to calculate the average weight. (B) mRNA levels of all γ-crystallins were analyzed using quantitative RT-PCR. mRNA isolated from lenses in newborns and 8-week-old mice (C) Real-time PCR analysis of mRNA levels of γS-crystallin using specific primers. Total RNA was isolated from human lens epithelial cells (SRA01/04) transfected with HSF4b, shRNA for HSF4, or HSF4b plus shRNA for HSF4 respectively. (D) Promoter sequence alignment of seven γ-crystallins. Sequences identical to the classic heat-shock element (HSE) sequence are shown in red. Asterisks indicate the key nucleotides essential for heat-shock factor binding. (E) CRYGS gene promoter was PCR-amplified from chromatin immunoprecipitation-enriched DNA from human lens epithelial cells (SRA01/04) transfected with HSF4b. (F) Relative luciferase activity of the promoter-luciferase construct (CRYGS-luc) by transfection with HSF4a or HSF4b in human lens epithelial cells (SRA01/04). The transfections were performed in hexaplets and the Renilla luciferase plasmid was used as normalization control. The results are shown as means with standard deviations. The classic HSE-luc was used as a positive control.

The γS-crystallin mutant rncat is a recessive cataract mouse model with a G to A transition point mutation at position 489 in exon 3 of the γS-crystallin gene [19]. In heterozygous rncat mice, the lens is nearly normal. Our results suggested that HSF4 regulates the expression of the γS-crystallin gene (Figure 2). If this is true, lack of HSF4 would decrease the production of the wild-type γS-crystallin and might facilitate the cataract development. As reference, lens opacity of the rncat mouse appeared at 11 days old in the nuclear region [19]. Interestingly, when we intercrossed the Hsf4 -/- mouse with the rncat mouse, the offspring (Hsf4 -/- /rncat/+) developed cataract in the posterior part of the lens, whereas the lenses of the Hsf4 +/+ /rncat/+ mice remained basically clear (Figure 3A). The cataract in double null Hsf4 -/- /rncat/rncat mice developed earlier, appearing at 7 days old, and seemed more severe than that in Hsf4 +/+ /rncat/rncat mice (Figure 3A). Lack of Hsf4 worsens the lens fiber defect in γS-crystallin mutation mouse rncat [see Additional file 1]. The mRNA levels of γS-crystallin in the Hsf4 -/- /rncat/+ and Hsf4 -/- /rncat/rncat mice were reduced by more than 90% compared with the Hsf4 +/+ and rncat/rncat mice (Figure 3B). To certain degree, our results further suggest that Hsf4 disruption reduces Crygs expression along with the cataract formation in the offsprings in such intercross.

Intercross of Hsf4 knockout mouse and rncat mouse. (A) Slit-lamp cataract evaluation of mice with different genotype compositions. Lack of Hsf4 induced a more severe phenotype in heterozygous and homozygous rncat mice. (B) The mRNA levels of γS-crystallin in the Hsf+/+, rncat/rncat, Hsf4-/-/rncat/rncat, and Hsf4-/-/rncat/+ mice were analyzed by semi-quantitative RT-PCR.

Intermediate filament genes (Bfsp1/2) are down-regulated in the lens of the Hsf4 -/- mouse

An important aspect of Hsf4 -/- cataract formation is the abnormal development of lens fibers that cannot be explained completely by crystallin regulation. The loose fiber structure of the lens of the Hsf4 knockout mouse is similar to that of the Bfsp1/2 knockout mouse. Knockout of Bfsp1 or Bfsp2 resulted in a loose fiber structure with fewer connection proteins among the fibers [20, 21]. Therefore, we quantified the levels of Bfsp1 and Bfsp2 mRNA in the lens of Hsf4 knockout mice. At birth, the level of Bfsp2 mRNA in Hsf4 knockout mice was half that of wild-type mice, whereas the level of Bfsp1 mRNA was similar to that of wild-type mice. At 8 weeks old, the levels of both Bfsp1 and Bfsp2 mRNA in the lens of Hsf4 knockout mice were reduced by more than sevenfold compared with wild-type mice, except for the level of Bfsp1 mRNA at birth (Figure 4A). It has been suggested that the 129S3 strain mouse has a deletion that causes the loss of exon 2 from Bfsp2 mRNA and dramatically reduces mRNA levels of Bfsp2. The Bfsp2 protein in this strain was undetectable by antisera to the wild-type protein [22, 23]. Since we obtained the expected PCR product size from Hsf4 -/- mice using primers from the deletion region of the Bfsp2 gene (Figure 4B), there is at least one copy of the C57BL/6J Bfsp2 allele in the Hsf4 -/- mouse. Although Hsf4 -/- mice carry at least one copy of the highly expressed C57BL/6J Bfsp2 allele, Hsf4 -/- mice had a much lower level of expression of Bfsp2 compared to the 129S3 wild-type mice. Therefore, lack of Hsf4 results in reduced expression of Bfsp2.

HSF4 regulates lens-specific bead filaments Bfsp1 and Bfsp2. (A) Real-time PCR to determine mRNA levels of Bfsp1 and Bfsp2 in lenses of newborn and adult wild-type and Hsf4 knockout mice. (B) PCR analysis of Bfsp2 gene deletion in Hsf4 -/- , Hsf4 +/- , 129S3 and C57BL/6J mice using primers from the deleted region of the Bfsp2 gene in 129S3 mice. The results suggest that there is at least one copy of the C57BL/6J Bfsp2 allele in the Hsf4 -/- mouse. (C) Real-time PCR analysis of mRNA levels of Bfsp1 and Bfsp2 after transfection of SRA01/04 with HSF4b, shRNA for HSF4, or HSF4b plus shRNA for HSF4. (D) Promoter sequence alignment of Bfsp1 and Bfsp2. Sequences identical to the classic heat-shock element (HSE) sequence are shown in red. Asterisks indicate the key nucleotides essential for heat-shock factor binding. (E) Bfsp1 and Bfsp2 gene promoters were PCR-amplified from chromatin immunoprecipitation-enriched DNA from human lens epithelial cells (SRA01/04) transfected with HSF4b. (F) Relative luciferase activities of the promoter-luciferase constructs (Bfsp1-luc or Bfsp2-luc) after transfection with HSF4a or HSF4b in human lens epithelial cells (SRA01/04). The transfections were performed in hexaplets and the Renilla luciferase plasmid was used as normalization control. The results are shown as means with standard deviations. The classic HSE-luc was used as a positive control.

Overexpression of hHSF4b in SRA01/04 cells up-regulated transcription of Bfsp1 and Bfsp2, and this up-regulation could be inhibited specifically by short hairpin RNA (shRNA) targeting HSF4b (Figure 4C). As both Bfsp1 and Bfsp2 promoters have the less-conserved HSE sequence (Figure 4D), we asked whether HSF4 can directly bind the promoter region of Bfsp1 and Bfsp2. We used ChIP assays and detected the expected bands of the promoter sequences of the Bfsp1 and Bfsp2 genes in cells expressing HSF4b (Figure 4E). This result suggests that HSF4 can bind to the promoter regions of the Bfsp1 and Bfsp2 genes. We then used the dual-luciferase system to assess whether HSF4 activates the expression of Bfsp1 and Bfsp2. Human Bfsp1-luc or Bfsp2-luc vectors were co-transfected with or without HSF4b into SRA01/04 cells. The relative luciferase activities of HSF4b co-transfected with Bfsp1-luc or Bfsp2-luc were approximately eightfold that of cells transfected with Bfsp1-luc or Bfsp2-luc alone (Figure 4F). These results indicate that HSF4 may bind specifically to the Bfsp1-HSE/Bfsp2-HSE sequences and regulate expression of the intermediate filament proteins Bfsp1 and Bfsp2.

The 2D electrophoretic analysis of HSF4 -/- lens components

To analyze changes in the lens components of HSF4 -/- mice, we used 2D electrophoretic (2D-E) analysis to generate maps of lens lysates from 8-week-old Hsf4 -/- and wild-type (129S3) mice, and identified selected spots using liquid chromatography (LC) with tandem mass spectrometry (MS/MS). By comparison with published 2D-E maps of the mouse lens [11, 24], it was possible to identify most of the crystallin proteins in the 2D-E maps of both Hsf4 -/- and wild-type mice. The lens of Hsf4 -/- and wild-type mice generally had similar 2D-E protein expression patterns with silver staining. However, alterations of some spots were observed in the 2D-E map of the Hsf4 -/- lens, which generally had less crystallin protein than the wild-type lens (Figure 5A). We selected and identified some altered spots in the lens map of Hsf4 -/- mice, where signals were either absent or weaker than in wild-type mice (Figure 5B). Interestingly, 5 of 21 spots were identified by MS as αA-crystallin. These αA-crystallin molecules had a molecular mass of 15–25 kDa and migrated to the acidic region (PI 4.5–6.0) of the 2D-E gel. Other differentially expressed spots were identified as αB-crystallin, βA1-crystallin, βB2-crystallin, γC-crystallin, γB-crystallin, heat shock protein 27, and 7 unknown proteins. Due to the limited amount of sample, the terminal sequences of selected spots were not characterized. A similar pattern has been reported during aging or cataractogenesis in mice [24]. Truncation of αA-crystallin is probably caused by the activation of a class of calcium-activated proteases known as calpains, such as Lp82 or calpain2 [25, 26], which have 'clipping' activity in vitro. We used real-time PCR to determine the levels of Lp82 and calpain2 mRNAs in the lens of Hsf4 knockout mice and of wild-type mice. The mRNA levels of Lp82 and calpain2 in Hsf4 knockout neonatal mice were reduced fourfold compared with wild-type mice this reduction was even greater in 8-week-old mice (Figure 5C). Therefore, loss of αA-crystallin and reduced expression of Lp82 and calpain2 may be correlated with Hsf4 disruption.

Hsf4 knockout mouse lacks lens-specific αA-crystallin modification. (A) 2D electrophoresis map of lens proteins of 8-week-old Hsf4 -/- and Hsf4 +/+ mice. There are obvious differences between the two 2D gels in the boxed regions. The most different intensity dots between Hsf4 -/- and Hsf4 +/+ mice are indicated with red circles. (B) High-resolution images of boxed regions in panel A images showing loss of posttranslational modification αA-crystallin spots in the lenses of Hsf4 -/- mice. In Hsf4-/- mice, some αA-crystallin truncated fragments were missing in the 2D gel. Proteins were identified by LC/MS sequencing. The arrows point to the truncated αA-crystallins. (C) Real-time PCR determining mRNA levels of calpain2 and Lp82 in newborn and adult wild-type and Hsf4 knockout mice.


Stress treatment triggers a variety of responses across time

To compare the contributions of the mRNA and protein expression response in a dynamic system, we designed a time-course experiment of mammalian cells being subjected to ER stress. We subjected HeLa cells to 2.5 mM DTT-induced ER stress over a 30-h period, sampling at eight time points (0, 0.5, 1, 2, 8, 16, 24, and 30 h) (Appendix Fig S1). In this setup, DTT had a half-life of

4 h (Appendix Fig S2). We first conducted a number of assays to characterize the cellular phenotype in response to the treatment (Fig 1A, Appendix Fig S3). For example, since the time course spanned more than one cell doubling of

24 h, we tested how the stress affected cell proliferation, as measured by changes in cell density. The cell density decreased during the first 16 h, after which it increased, suggesting that a fraction of the cell population underwent apoptosis, while surviving cells proliferated normally (Fig 1A, upper panel).

Figure 1. Cells undergo a complex response to DTT treatment

  1. We estimated the degree of active cell division based on the cell density changes, the distribution of the DNA content, and the degree of active mitosis. Top panel: Bar graphs show numbers of live cells, with mean and standard deviations. Black lines, DTT treatment time. Middle panel: Quantitative analysis of cell cycle phases by flow cytometry using propidium iodide staining of DNA for cells treated with DTT for different periods of time. The 2N, 4N peaks and S-phase plateau were observed in all time points, suggesting active cell division. Bottom panel: Immunofluorescence experiments show mitotic nuclei in red (anti-phospho-histone H3 (Ser10) antibody) and other nuclei in blue (DAPI). Mitotic nuclei were observed throughout the entire experiment. The ratio between the number of mitotic and all nuclei was similar among all the stress phases (not shown). White arrows, apoptotic nuclei. All experiments were performed in triplicate. The complete data are in Appendix Fig S3.
  2. Summary of function enrichment of mRNA expression changes (FDR < 0.05, *P < 0.001, **P < 0.0001, and ***P < 0.00001). The corresponding expression data are shown in Appendix Fig S5. While some apoptosis occurred, remaining cells underwent intense unfolded protein and ER stress response.

This interpretation was confirmed by assays monitoring cell cycle progression and apoptosis: While apoptosis occurred during the first two hours of the experiment, later time points showed a continued division of the majority cells (Fig 1A, middle/lower panel Appendix Fig S3). DNA labeling coupled to flow cytometry showed that apoptosis peaked at 2 h, with

45% of cell death. Notably, the sample preparation for the mRNA and protein analysis discarded cellular debris the results below hence focus on live cells. The same experiment also showed most of the population underwent active mitosis: As expected, most cells were in G1 stage across the entire experiment, and some cells continued DNA synthesis (Fig 1A middle panel). This result was confirmed by immunocytochemistry using the anti-phospho-histone H3 (Ser10) antibody as a mitosis marker. The stressed and control groups were very similar with respect to distribution across the G2/M checkpoint and the M phase of active cell division (Fig 1A, lower panel). In sum, while suffering from a loss of cells during the early phase of the experiment, the surviving cell population continued division throughout the entire time course.

Genome-wide transcriptomics measurements confirmed this view and manifested roughly three phases of the response where concerted changes happened: early (<2 h), intermediate (2–8 h), and late (> 8 h) (Fig 1B, Appendix Fig S5). Genes related to transcription regulation and programmed cell death were significantly up-regulated during the early phase (FDR < 0.05). During the intermediate phase, genes involved in ER stress and UPR were highly expressed, while at the same time, genes related to translation elongation, RNA splicing and transport, and macromolecular complex assembly were suppressed, suggesting that stressed cells put basic cellular functions to a halt (FDR < 0.05). During the late phase, cells expressed genes involved in protein ubiquitination, lysosome, and glycoprotein and transmembrane protein synthesis, indicating the recovery of surviving cells (FDR < 0.05). The increase in lysosomal proteins is consistent with the observations which found that the UPR remodels the lysosome as part of a pro-survival response (Ron & Hampton, 2004 Sriburi et al, 2004 Brewer et al, 2008 Elfrink et al, 2013 ).

The integrated transcriptome and proteome are highly dynamic

Next, we conducted a large-scale, quantitative proteomic analysis to complement the transcriptomic data. A variety of tests confirmed the quality of the proteomic data, for example, Western blots of selected proteins and analysis of housekeeping genes, and its reproducibility across the two biological replicates (Appendix Figs S11 S12 and S13). We quantitated a total of 3,235 proteins at least once across all time points and replicates and chose a high-confidence dataset of 1,237 proteins with complete time-series measurements across both replicates for further analysis. This high-confidence dataset is comparable in size to that of a recent study (Jovanovic et al, 2015 ). We also constructed an extended dataset with 2,131 proteins which showed similar results (Appendix Fig S19).

The high-confidence dataset was further processed to remove measurement noise and then used for the analyses described below. Protein concentrations spanned about five orders of magnitude (Appendix Table S1), which is similar to what other large-scale studies observe (Schwanhausser et al, 2011 ). Their reproducibility was high (R > 0.94 for seven of the eight time points, Appendix Fig S10) the correlation with the corresponding mRNA concentrations was consistent across samples (Appendix Fig S13). Heatmaps of the integrated and clustered mRNA and protein expression values show that overall expression changes were similar between the two biological replicates (Fig 2, Appendix Figs S5, S9 and S14), but some discrepancies existed. In some cases, peak expression changes occurred at 2 h in one replicate and at 8 h in the other. To describe experimental reproducibility, we calculated a replicate consistency measure (RCM) that lists the Pearson's correlation coefficient between replicate time-series measurements of normalized, log-transformed RNA and protein concentrations. At a total of eight data points, a Pearson's correlation coefficient > 0.7 corresponds to a P-value = 0.05. For example, for GRP78, the RCM is 0.87/0.97, suggesting high reproducibility between the two biological replicates. Appendix Fig S13 displays the frequency distributions of all RCM values and shows a bias toward high values.

Figure 2. RNA and protein expression changes are highly dynamic

In Fig 2, we identified several major groups with similar expression changes. For example, genes involved in the general stress response were significantly up-regulated during the intermediate and late phase of the experiment both at the mRNA and at the protein level (Appendix Fig S14). Translation-related and mitochondrial genes were down-regulated at the mRNA level, consistent with a halt in metabolic processes of stressed cells however, these proteins were up-regulated at the protein level.

A statistical tool identifies hidden regulatory signals

In the results described below, we used the PECA tool to extract regulatory signals from the RNA and protein time-series data. First, to illustrate the interpretation of PECA results, we show the example of GRP78 (HSP5A), an ER chaperone induced by ER stress and an important anti-apoptotic, pro-survival component of the UPR (Fig 3). The figure displays GRP78's mRNA and protein concentrations and the PECA results with respect to RNA- and protein-level rate ratios and significance (RCM = 0.87/0.97). We see that GRP78's mRNA and protein expression patterns across the treatment were very different from each other: mRNA concentrations peaked at 8 h and declined afterward, while protein concentrations continuously increased. Similar to the concentration data, RNA rate ratios for GRP78 peaked between two and eight hours and decreased later, while protein rate ratios plummeted in the beginning and elevated to the pre-treatment level throughout the intermediate and late phase, resulting in continuously rising protein concentration. PECA identified both significant regulation of RNA expression in the early and late phase, respectively, as well as a significant protein-level regulation in the late phase of the experiment (FDR < 0.05 Fig 3, shaded area).

Figure 3. PECA deconvolutes expression data to extract regulatory information at the RNA and protein level

Importantly, PECA identified what was invisible from the inspection of concentration data alone: At around 16 h, RNA expression was significantly down-regulated, but protein concentrations continued to rise. This increase was realized through an up-regulation of protein expression, either through increased translation or through protein stabilization, and PECA sensitively identified this regulatory event. Notably, PECA was able to distinguish this up-regulation at the protein level from an increase in protein concentrations that is purely due to constant translation of the existing mRNAs at preceding time points, and define regulation as a significant change in synthesis and degradation rates from one time interval to the next. This regulatory event is also an example of the sometimes counterbalancing effects of RNA- and protein-level regulation (discussed below and in Appendix Fig S16). Incorporating overall data properties and measurement noise, PECA enabled us to quantitate regulatory events and extract them in a systematic and statistically consistent manner. The entire PECA results are provided in the Dataset EV1.

Protein concentration changes occur in greater magnitude, but both regulatory levels contribute equally and independently

Before discussing the overall PECA outcomes, we examined general properties of the integrated mRNA and protein concentration data (Fig 4A–D). In general, both protein and mRNA concentrations hardly changed during the early phase of the experiment, but during the intermediate and late phase with different dynamics. Consistent with earlier studies (Murray et al, 2004 ), the transcriptome was comparatively static in our experiment, with average changes of about 1.5-fold. Transcript concentrations diverged maximally from the steady state at 8 h, after which they returned to the original levels. In contrast, protein concentrations continuously diverged from the beginning until the end of the experiment, with much less change during the late phase (Fig 4, Appendix Fig S15). The magnitude of change was also more pronounced for proteins than for mRNAs, illustrated by the average (and range) of expression fold changes which were larger than those for mRNAs (Fig 4, Appendix Table S1).

Figure 4. The proteome response is dominant during ER stress

  • A, B. Correlation (Pearson's R 2 ) between normalized, absolute expression values at time 0 and the respective time points.
  • C, D. Average fold change (log base 10) and standard deviation of normalized, relative expression values.
  • E, F. The number of significantly regulated genes as determined by PECA (FDR < 0.05). We summarized the CPS probabilities of each gene by choosing the maximum probability across the time points in each of the three phases, which allows us to characterize how expression regulation (rate ratio) has shifted phase by phase. Labels E, I, and L mark the early, intermediate, and late phase, respectively.

To quantitate the contribution of the two regulatory levels to the cellular response in this system, we extracted significantly regulated genes by applying a 5% FDR cutoff to the PECA results. Figure 4E and F shows the number of significantly regulated genes per time point Table 1 summarizes the results in a different manner. Most of the significant RNA-level regulation during the ER stress response occurred during the intermediate and also during the late phase (Fig 4, Table 1). Regulatory activity, that is, changing mRNA rate ratios, spiked around the 2- to 8-h mark, without additional regulation afterward: Concentrations simply returned slowly back to initial values. A similar overall pattern was also observed for the protein level (Fig 4).

Table 1 shows the numbers of significant regulatory events for one of the replicates, grouped according to phase, level, and direction of the regulation. While most changes occurred during the intermediate phase, the distributions of these changes are consistent across phases and replicates even when different significance cutoffs were applied (not shown). The numbers are symmetrically distributed across the table, confirming the observation from Fig 4E and F that mRNA- and protein-level regulation contributes equally to the overall gene expression changes in this experiment, affecting similar numbers of genes. As Table 1 shows, if a gene was significantly regulated during a specific phase of the response, this regulation typically occurred at either the mRNA or the protein level, but not both at the same time the numbers of genes in each of the square's corners are smaller than those in the middle rows or columns. However, some genes showed mRNA- and protein-level regulation moving in the same direction during the same phase, and others showed movement in opposite directions.

Table 1 already indicates that discordant regulation is comparatively rare: Only few genes are listed in the lower left and upper right corners of the tables (75 genes in total). One such example is GRP78 (Fig 3) for which mRNA expression is down-regulated and protein expression is up-regulated at the 16-h time point. An alternative way to identify discordant regulation confirmed this result, that is, via filtering for negative correlation between PECA's mRNA and protein time-course rate ratios in both replicates (Dataset EV1, Appendix Fig S16A and B). We then further refined this filtering and required not only opposing regulation, that is, at least one significant regulatory event at the mRNA and one at the protein level, but also constant protein concentrations, that is, changes smaller than 1.5-fold across both biological replicates. Such a scenario would indicate cases of “buffering” in which changes in mRNA concentrations are counterbalanced to result in no overall change at the protein level. Three out of the 75 genes passed this additional filtering and are shown in Appendix Fig S16C. One of these genes is HSC70 (RCM = 0.91/0.09), a chaperone discussed below (Fig 6A). Overall, we conclude that discordant regulation is rare, and the dynamics in the balance of synthesis and degradation of mRNA and protein occur in an independent manner.

Protein expression regulation reaches a new steady state

After quantitating the overall contributions and direction of the regulatory changes, we set out to examine general temporal patterns of regulation. To do so, we constructed a clustered heatmap of median-centered RNA and protein rate ratios and calculated the average rate ratios across the six largest clusters (Fig 5). A stark contrast in coloring between consecutive columns indicates significant regulation of an individual gene: A change in synthesis and degradation rates results in a change in rate ratios between time intervals. Fig 5 shows a striking difference between the mRNA and protein level of regulation. For RNA-level regulation, many PECA rate ratios spike during the intermediate phase, resulting in significant changes at both the two- and eight-hour boundary time points. Before and after this interval, mRNA synthesis and degradation rates were relatively constant, with some exceptions during the late phase. We note that absence of regulation in the early time points is unexpected since, for example, many cells underwent apoptosis within the first two hours, suggesting that these processes may have occurred before our first measurement at 30 min. The pulse-like or transient behavior of the RNA-level regulation was confirmed both for the extended dataset (2,131 genes) and for the entire transcriptome (> 18,000 genes) (Appendix Figs S19 and S21), indicating that the high-confidence dataset delivers representative results. We observe strong spikes in extreme rate ratios between 2 and 8 h, with significant regulation leading into and out of this phase.

Figure 5. RNA- and protein-level regulations have different temporal modes

  1. Heatmap of RNA and protein rate ratios as computed by PECA, shown for the two replicates.
  2. The average rate ratios across six major clusters for both RNA (top) and protein (bottom). RNA rate ratios show a spike in their changes during the intermediate phase, while protein rate ratios change only once around the two-hour mark and remain at the new steady-state level throughout the remainder of the experiment. The clusters are defined in Dataset EV1.

Next, we analyzed the temporal behavior of protein-level regulation during our experiment. Similar to mRNA, little regulation occurred during the early phase, but it rapidly increased during the intermediate phase (Fig 5). However, in contrast to the pulse-like mode of RNA-level regulation, PECA showed that many protein rate ratios changed only once during the intermediate phase, in a switch-like or permanent manner, but then remained constant. This switch-like behavior is even more apparent when examining the averaged rate ratio changes across the different gene expression clusters (Fig 5, right). After the change at around 2 h, the protein concentrations did not revert back and stayed at the new level throughout the remainder of the experiment, indicating execution of the same protein synthesis and degradation rates that had been set earlier, without additional regulation. As can be seen in Fig 5 (right), the switch-like behavior applied to both up- and down-regulation and was independent of the mode of mRNA regulation. It is also present in the extended dataset (Appendix Fig S19). The PECA results confirmed what the concentration data had hinted for: While mRNA expression returned to the original values, protein-level regulation reached a new steady state.

PECA results help to generate hypotheses on regulatory modes

Finally, we examined three groups of genes in detail to illustrate how our analysis can detect signals that are otherwise hidden and help to generate hypotheses on possible regulatory modes. The first example group includes GRP78 (HSPA5, BiP RCM = 0.87/0.97) and other chaperones (Fig 6A). As discussed above, up-regulation of GRP78 at both the mRNA and protein level is expected due to its crucial role during the ER stress response. It is tempting to hypothesize that its strong protein-level up-regulation might be mediated by the internal ribosome entry site in its 5′UTR. However, the validity of this hypothesis is still debated (Fernandez et al, 2002 ).

Figure 6. PECA identifies groups of similarly regulated genes

  1. Five chaperones, including GRP78, with mixed expression patterns.
  2. Eight subunits of ATP synthases observed in the experiment with mostly invariable RNA concentrations and increasing protein concentrations. PECA amplifies the hidden signal and identifies a significant protein-level regulation.
  3. Six aminoacyl-tRNA synthetases whose mRNA concentration increases temporarily, but the protein concentrations remain largely constant. PECA deconvolutes the two opposing regulatory effects that act at the RNA and protein levels.

Another gene in this group is HSC70 (HSPA8 RCM = 0.91/0.09), which is, similar to GRP78, a chaperone with pro-survival functions in the cell (Zhang et al, 2013 ). However, its protein expression pattern is different from that of GRP78 in that it remains constant across the time course. HSC70 is constitutively expressed and helps folding of nascent protein chains. Under stress, it has been described to be slightly induced (Liu et al, 2012 ). In our dataset, we observe a significant drop in mRNA concentrations during the early phase of the experiment and a later recovery. Interestingly, this expression change is not transmitted to protein concentrations, but counterbalanced by a significant, transient up-regulation of protein expression. This behavior makes HSC70 one of the three examples for potential buffering discussed above (Appendix Fig S16).

Not only HSC70, but also HSP90AA1 and HSP90B1 serve as co-chaperones for the HSP90 proteins. HSP90B1 (GRP94, TRA1 RCM = 0.90/0.91) is localized to melanosomes and the ER and assists in protein folding. The protein appears to be regulated in two phases. After a short-term transcription increase (followed by transcription decline), protein production is augmented during the intermediate and late phases of the ER stress experiment. Finally, Fig 6A shows P58IPK (DNAJC3 RCM = 0.88/0.63), which is a member of the Hsp40 chaperone family and an inhibitor of the eIF2α kinase PERK. Due to this function, it is essential for translation re-start after the initial, ER stress-related translation shutdown (Roobol et al, 2015 ). An ER stress element in P58IPK's promotor region is known to activate the gene's transcription in response to ER stress (Yan et al, 2002 ). In our experiment, despite up-regulation at the mRNA level, protein concentrations are constant over the entire time course, suggesting homeostatic down-regulation at the protein level. However, this case did not qualify for buffering according to our criteria. The low P58IPK levels together with the continuous increase in GRP78 concentration (Yan et al, 2002 ) indicate that an ongoing ER stress response delayed return to normal translation in our experiment.

The second example group comprises 196 genes with invariable RNA concentrations, but whose protein concentrations increased during the late phase (Appendix Fig S14, Dataset EV1, cluster 8). Genes in this group are enriched in mitochondrial proteins, ATP biosynthesis, ribosomes, translation, and transmembrane proteins (FDR < 0.05). The ATP synthase genes are shown in Fig 6B. ATP synthases have essential roles in cellular ATP biosynthesis, and their increased activity likely boosts cellular ATP levels, which in turn helps provide the energy needed for the UPR. We identified eight subunits (ATP5B, C1, D, F1, H, I, L, O average RCM = 0.50/0.61) with similar expression patterns. PECA of these genes shows how our tool can extract an otherwise hidden signal: PECA correctly identified a significant positive regulation at the protein level that results in an increase in absolute protein concentrations of the ATP synthase subunits.

To generate hypotheses on possible mechanisms for the up-regulation of these proteins, we collected > 160 sequence features, including length, signal sequences, nucleotide composition, amino acid composition, translation regulatory elements, RNA secondary structures, and post-translation modifications (Appendix Table S2). When testing this example group for biases across the features, we found a significant depletion in proline and glutamic acid, which are parts of PEST sequences that shorten protein half-lives, and disordered regions, that is, COILS and REM465 (t-test, P < 0.0001), which are also known to destabilize proteins. Depletion in these two characteristics would stabilize the protein and would explain the up-regulated protein expression found by PECA.

The last example group contains 91 genes (Dataset EV1, cluster 3) that are characterized by an increase in both mRNA and protein concentrations and are significantly enriched in oxidoreductases and interestingly, aminoacyl-tRNA synthetases, namely GARS, YARS, IARS, AARS, SARS, and EPRS (FDR < 0.05 average RCM = 0.88/0.21). The aminoacyl-tRNA synthetases are shown in Fig 6C and are examined in more detail in Appendix Figs S17 and S18. A number of the enzymes show a striking gene expression pattern in which protein synthesis is delayed by several hours, compared to RNA synthesis. As this protein synthesis only occurs after mRNA concentrations decrease already, the resulting final protein concentrations remain comparatively constant (Fig 6C). These cases did not qualify for “buffering” regulation, as they did not pass our filtering criteria.

However, post-transcriptional regulation of aminoacyl-tRNA synthetases has been observed before in other contexts (Kwon et al, 2011 Chen et al, 2012 Park et al, 2012 Guan et al, 2014 Wei et al, 2014 ). Its cellular role and underlying mechanism remained unknown until a recent publication delivered an intriguing explanation: Aminoacyl-tRNA synthetases express alternative splice variants that lack the catalytic domain but which often have additional “moonlighting” functions independent of their original role during translation (Lo et al, 2014 ). Based on these findings, we hypothesized that the discrepancy between mRNA and protein expression patterns for some genes might be explained by the differential expression of splice variants, and we examined the proteomics data manually for such examples (Appendix Figs S17 and S18). Unfortunately, as the proteomics experiment had not been designed to detect splice variants, only three enzymes (AARS, IARS, and QARS) provided enough information to draw some conclusions. While we detected for each of these three enzymes a set of sequence variants with differential expression, future work will have to confirm whether these alternative splicing events are indeed functional and affect the overall, averaged protein expression levels as observed in Fig 6C.

Control of the Cell Cycle

The length of the cell cycle is highly variable even within the cells of an individual organism. In humans, the frequency of cell turnover ranges from a few hours in early embryonic development to an average of two to five days for epithelial cells or an entire human lifetime spent in G 0 by specialized cells such as cortical neurons or cardiac muscle cells. There is also variation in the time that a cell spends in each phase of the cell cycle. When fast-dividing mammalian cells are grown in culture (outside the body under optimal growing conditions), the length of the cycle is approximately 24 hours. The timing of events in the cell cycle is controlled by mechanisms that are both internal and external to the cell.

Regulation of ferritin by hormones, growth factors, and second messengers

Transcription of the human ferritin H gene is induced in response to both hormones and second messengers, including cAMP. Thecis-acting elements mediating these responses have mapped to a relatively small region in the proximal promoter of the human ferritin H gene (Figure 2).

There were 2 groups that identified ferritin H as a gene differentially expressed in response to thyrotropin in rodent cells.100,101 Subsequent work revealed that dibutyryl-cAMP recapitulated the effect of thyrotropin on ferritin H transcripts, albeit with different kinetics.102 Short fragments of the rat 5′ flanking region (up to 400 bp) but not longer fragments were responsive to dibutyryl-cAMP and thyrotropin in murine 3T3 cells and FRTL5 thyroid cells.103,104 Nuclear run-on assays confirmed the transcriptional effect of thyrotropin on ferritin H.105 The cAMP-dependent induction of ferritin was inhibited by ras in a rat thyroid cell line.106

Collectively, these experiments demonstrated that thyrotropin increased ferritin H transcription, probably by elevating cAMP. cAMP-mediated induction of ferritin H transcription was further defined in human HeLa cells.107 The human cAMP-responsive region (the B-box) binds a protein complex termed B-box binding factors (Bbf), comprised of the transcription factor NFY, the coactivator p300, and the histone acetylase p300/CBP associated factor (PCAF).108,109 The adenoviral oncogene E1A reduces the formation of this complex. Overexpression of p300 in HeLa cells reverses the E1A-mediated inhibition of the ferritin promoter driven by Bbf.110 Okadaic acid, a phosphatase inhibitor, stimulates H ferritin transcription in HeLa cells by increasing the interaction between the p300 coactivator molecule and other components of Bbf.111 In cells with low expression of human ferritin H, overexpression of the histone acetylase PCAF activates transcription from the B-box of ferritin H.112 The B-box may also mediate the increase in ferritin H mRNA that occurs during spontaneous differentiation of Caco-2 colon carcinoma cells108 and vascular smooth muscle.113 Other important regulatory elements in the human ferritin H gene include a region called the A-box at position −132, which contains an SP-1 consensus sequence.114

Since evaluation of the rodent cAMP regulatory region showed that longer promoter fragments exhibited a reduced rather than enhanced response to thyrotropin, these experiments also suggested the presence of negative cis-acting elements that may counteract the effect of cAMP and thyrotropin. Additional evidence for a negative regulator(s) of ferritin H transcription was obtained by Barresi et al.115 This group demonstrated that there is a stretch of 10 G's, which they termed “G-fer” between −272 and −291 of the human ferritin H gene. A 3-bp substitution mutation in this region increased promoter activity in HeLa cells, suggesting an inhibitory effect of this sequence on ferritin transcription. Inhibitory factor 1 (IF-1), which binds ubiquitously to G-rich sequences, was suggested to bind to this region.

Thryoid hormone may also regulate ferritin posttranscriptionally: T3 modulates the activity of IRP1, affecting its ability to bind to the ferritin IRE, possibly through induction of signal transduction cascades that result in phosphorylation of IRP1.116 T3 and TRH also induce the phosphorylation of IRP2.51

Similarities between the murine and human ferritin H gene highlight the conservation not only of ferritin function, but of ferritin regulation across species. The murine ferritin H gene contains similar elements to those described above however, they are located almost 5 kb 5′ to the corresponding regulatory elements identified in the human ferritin H gene (see Figure 2). The murine ferritin H gene contains a basal enhancer FER1 that also binds p300 and is inhibited by E1A.117 Contained within FER1 is a region of dyad symmetry that binds SP1, like the A-box of the human ferritin H gene. However, to date the FER1 region has not been shown to respond to cAMP.

In addition to thyroid hormone, insulin and IGF-1 have also been implicated in regulation of ferritin at the mRNA level. Insulin and IGF-1 both induced mRNA for H and L ferritin in C6 glioma cells.118 There was no additive effect on ferritin induction when both hormones were combined at the optimal concentration of each, suggesting that insulin might be acting through the IGF-1 receptor.118 In contrast to the equal induction of ferritin H and L by insulin and IGF-1, in pancreatic cells high glucose caused selective induction of ferritin H mRNA, with a 4-fold to 8-fold increase in ferritin H mRNA, a 75% to 90% decrease in ferritin L, and an overall 3-fold increase in ferritin as assayed by immunostaining.119


Elowitz M, Levine A, Siggia E, Swain P: Stochastic gene expression in a single cell. Science. 2002, 297: 1129-1131. 10.1126/science.1070919

Bar-Even A, Paulsson J, Maheshri N, Carmi M, O'Shea E, Pilpel Y, Barkai N: Noise in protein expression scales with natural protein abundance. Nature Genetics. 2006, 38: 536-643. 10.1038/ng1807.

Cai L, Friedman N, Xie X: Stochastic protein expression in individual cells at the single molecule level. Nature. 2006, 440: 358-362. 10.1038/nature04599

Dublanche Y, Michalodimitrakis K, Kümmerer N, Foglierini M, Serrano L: Noise in transcription negative feedback loops: simulation and experimental analysis. Mol Sys Biol. 2006, msb4100081: E1-

Hooshangi S, Weiss R: The effect of negative feedback on noise propogation in transcriptional gene networks. CHAOS. 2006, 16: 026108- 10.1063/1.2208927

Kepler T, Elston T: Stochasticity in transcriptional regulation: origins, consequences and mathematical representations. Biophysical Journal. 2001, 81: 3116-3136.

Thattai M, van Oudenaarden A: Intrinsic noise in gene regulatory networks. Proc Natl Acad Sci USA. 2001, 98: 8614-8619. 10.1073/pnas.151588598

Swain P, Elowitz M, Siggia E: Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc Natl Acad Sci USA. 2002, 99: 12795-12800. 10.1073/pnas.162041399

Paulsson J: Summing up the noise in gene networks. Nature. 2004, 427: 415-418. 10.1038/nature02257

Keseler I, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, Paulsen I, Peralta-Gil M, Karp P: EcoCyc: a comprehensive database resource for Eschercichia coli. Nucleic Acids Res. 2005, 33: D334-337. 10.1093/nar/gki108

Becskei A, Serrano L: Engineering stability in gene networks by autoregulation. Nature. 2000, 405: 590-593. 10.1038/35014651

Wall M, Hlavacek W, Savageau M: Design principles for regulator gene expression in a repressible gene circuit. J Mol Biol. 2003, 332: 861-876. 10.1016/S0022-2836(03)00948-3

Shinar G, Dekel E, Tlusty T, Alon U: Rules for biological regulation based on error minimization. Proc Natl Acad Sci USA. 2006, 103: 3999-4004. 10.1073/pnas.0506610103

Simpson M, Cox C, Sayler G: Frequency domain analysis of noise in autoregulated gene circuits. Proc Natl Acad Sci USA. 2003, 100: 4551-4556. 10.1073/pnas.0736140100

Austin D, Allen M, McCollum J, Dar R, Wilgus J, Sayler G, Samatova N, Cox C, Simpson M: Gene network shaping of inherent noise spectra. Nature. 2006, 439: 608-611. 10.1038/nature04194

Cox C, McCollum J, Austin D, Allen M, Dar R, Simpson M: Frequency domain analysis of noise in simple gene circuits. CHAOS. 2006, 16: 026102- 10.1063/1.2204354

Rosenfeld N, Elowitz M, Alon U: Negative autoregulation speeds the response times of transcription networks. J Mol Biol. 2002, 323: 785-793. 10.1016/S0022-2836(02)00994-4

Kierzek A, Zaim L, Zielenkiewicz P: The effect of transcription and translation initation frequencies on the stochastic fluctuations in prokaryotic gene expression. J Biol Chem. 2001, 276: 8165-8172. 10.1074/jbc.M006264200

Chivers P, Sauer R: Regulation of high affinity nickel uptake in bacteria. Journal of Biological Chemistry. 2000, 275: 19735-19741. 10.1074/jbc.M002232200

Rolfes R, Zalkin H: Autoregulation of Escherichia coli purR requires two control sites downstream of the promoter. J Bacteriol. 1990, 172: 5758-5766.

Plumbridge J, Pellegrini O: Expression of chitobiose operon of Escherichia coli is regulated by three transcription factors: NagC, ChbR and CAP. Mol Microbiol. 2004, 52: 437-449. 10.1111/j.1365-2958.2004.03986.x

Kostelidou K, Thomas C: The hierarchy of KorB binding at its 12 binding sites on the broad-host-range plasmid RK2 and modulation of this binding by IncC1 protein. J Mol Biol. 2000, 295: 411-422. 10.1006/jmbi.1999.3359

Wang Q, Wu J, Friedberg D, Platko J, Calvo J: Regulation of the Escherichia coli lrp gene. J Bacteriol. 1994, 176: 1831-1839.

Ozdubak E, Thattai M, Kurster I, Grossman An, van Oudenaarden A: Regulation of noise in the expression of a single gene. Nature Genetics. 2002, 31: 69-73. 10.1038/ng869

Koern M, Elston T, Blake W, Collins J: Stochasticity in gene expression: from theories to phenotypes. Nature Reviews Genetics. 2005, 6: 451-464. 10.1038/nrg1615

Samoilov M, Arkin A: Deviant effects in molecular reaction pathways. Nature Biotech. 2006, 24: 1235-1240. 10.1038/nbt1253.

Reichheld S, Davidson A: Two-way interdomain signal transduction in tetracycline repressor. J Mol Biol. 2006, 361: 382-389. 10.1016/j.jmb.2006.06.035

Simpson M, Cox C, Sayler G: Frequency domain chemical Langevin analysis of stochasticity in gene transcription regulation. J Theor Biol. 2004, 229: 383-394. 10.1016/j.jtbi.2004.04.017

Bingle L, Thomas C: Regulatory circuits for plasmid survival. Current Opinion in Microbiology. 2001, 4: 194-200. 10.1016/S1369-5274(00)00188-0

de la Hoz A, Ayora S, Sitkiewicz I, Fernandez S, Pankiewicz R, Alonso J, Ceglowski P: Plasmid copy-number control and better-than-random segregation genes of pSM19035 share a common regulator. Proc Natl Acad Sci USA. 2000, 97: 728-733. 10.1073/pnas.97.2.728

El-Samad H, Kurata H, Doyle J, Gross C, Khammash M: Surviving heat-shock: control strategies for for robustness and performance. Proc Natl Acad Sci USA. 2005, 102: 2736-2741. 10.1073/pnas.0403510102

Neidhardt F, Ingraham J, Schaechter M: Physiology of the Bacterial Cell: a Molecular Approach. 1990, Sunderland, Massachusetts: Sinauer

Keeling M: Multiplicative moments and measures of persistence in ecology. J Theor Biol. 2000, 205: 269-281. 10.1006/jtbi.2000.2066

Gibson M, Bruck J: Efficient exact stochastic simulation of chemical systems with many species and many channels. J Phys Chem A. 2000, 104: 1876-1889. 10.1021/jp993732q.

ZH and SG: conceptualization. ZH, ML, UK, and SG: methodology. ZH, ML, UK, CP, and SG: validation. ZH, ML, and SG: formal analysis and writing—original draft preparation. SG, CP, and SH-C: resources. ZH, ML, SG, CP, and SH-C: writing—review and editing. SG: supervision and project administration. SG, ZH, CP, and ML: funding acquisition. All authors have read and agreed to the published version of the manuscript.

This study was in part supported by research grants by the Stiftung krebskranke Kinder—Regio basiliensis, the Stiftung pro UKBB, the University of Basel, and the China Scholarship Council (CSC, 201706920049). CP and ML acknowledged the Swiss National Science Foundation for financial support.

Scientists find cells rhythmically regulate their genes

A static picture of budding yeast cells expressing fluorescent protein fused Msn2 (green) and Mig1 (red) is shown. Each individual spot of green or red signal represents nuclear localization of the corresponding protein. Yellow color represents co-localization of the two proteins. Credit: Michael Elowitz and Yihan Lin/Caltech

Even in a calm, unchanging environment, cells are not static. Among other actions, cells activate and then deactivate some types of transcription factors—proteins that control the expression of genes—in a series of unpredictable and intermittent pulses. Since discovering this pulsing phenomenon, scientists have wondered what functions it could provide for cells.

Now, a new study from Caltech researchers shows that pulsing can allow two proteins to interact with each other in a rhythmic fashion that allows them to control genes. Specifically, when the expression of the transcription factors goes in and out of sync, gene expression also goes up and down. These rhythms of activation, the researchers say, may also underlie core processes in the cells of organisms from across the kingdoms of life.

"The way transcription factor pulses sync up with one another in time could play an important role in allowing cells to process information, communicate with other cells, and respond to stress," says paper coauthor Michael Elowitz, a professor of biology and biological engineering and an investigator with the Howard Hughes Medical Institute.

The research, led by Caltech postdoctoral scholar Yihan Lin, appears in the October 15 issue of Nature. Other Caltech authors of the paper are Assistant Professor of Chemistry Long Cai Chang Ho Sohn, a staff scientist in the Cai lab and Elowitz's former graduate student Chiraj K. Dalal (PhD '10), now at UC San Francisco.

Cai, Dalal, and Elowitz reported a functional role for transcription factor pulsing in 2008. In the meantime, researchers worldwide have been steadily uncovering similar surges of protein activity across diverse cell types and genetic systems.

Realizing that many different factors are pulsing in the same cell even in unchanging conditions, the Caltech scientists began to wonder if cells might adjust the relative timing of these pulses to enable a novel sort of time-based regulation. To find out, they set up time-lapse movies to follow two pulsing proteins and a target gene in real time in individual yeast cells.

The team tagged two central transcription factors named Msn2 and Mig1 with green and red fluorescent proteins, respectively. When the transcription factors are activated, they move into the nucleus, where they influence gene expression. This movement—as well as the activation of the factors—can be visualized because the fluorescent markers concentrate within the small volume of the nucleus, causing it to glow brightly, either green, red, or both. The color choice for the fluorescent tags was symbolic: Msn2 serves as an activator, and Mig1 as a repressor. "Msn2, the green factor, steps on the gas and turns up gene expression, while Mig1, the red factor, hits the brakes," says Elowitz.

When the scientists stressed the yeast cells by adding heat, for example, or restricting food, the pulses of Msn2 and Mig1 changed their timing with respect to one another, with more or less frequent periods of overlap between their pulses, depending upon the stressing stimulus.

Generally, when the two transcription factors pulsed in synchrony, the repressor blocked the ability of the activator to turn on genes. "It's like someone simultaneously pumping the gas and brake pedals in a car over and over again," says Elowitz.

But when they were off-beat, with the activator pulsing without the repressor, gene expression increased. "When the cell alternates between the brake and the gas—the Msn2 transcription factor in this case—the car can move," says Elowitz. As a result of these stress-altered rhythms, the cells successfully produced more (or fewer) copies of certain proteins that helped the yeast cope with the unpleasant situation.

Previously, researchers have thought that the relative concentrations of multiple transcription factors in the nucleus determine how they regulate a common gene target—a phenomenon known as combinatorial regulation. But the new study suggests that the relative timing of the pulses of transcription factors may be just as important as their concentration.

"Most genes in the cell are regulated by several transcription factors in a combinatorial fashion, as parts of a complex network," says Cai. "What we're now seeing is a new mode of regulation that controls the pulse timing of transcription factors, and this could be critical to understanding the combinatorial regulation in genetic networks."

"There appears to be a layer of time-based regulation in the cell that, because it can only be observed with movies of individual cells, is still largely unexplored," says Lin. "We look forward to learning more about this intriguing and underappreciated form of gene regulation."

In future research, the scientists will try to understand how prevalent this newfound mode of time-based regulation is in a variety of cell types and will examine its involvement in gene regulation systems. In the context of synthetic biology—the harnessing and modification of biological systems for human technological applications—the researchers also hope to develop methods to control such pulsing to program new cellular behaviors.

How would one determine if an up regulation of one protein leads to an over expression of another? - Biology

This article presents a review of commonly used vector-host systems for protein expression, based on the PDB database with protein expression information from over 30,000 publications and a Labome survey of randomly selected publications. The expression of toxic proteins is discussed in detail and expression systems used in the production of pharmaceuticals is briefly summarized.

Figure 1 shows two typical protein expression workflows. One workflow leads to the generation of a purified protein. The other leads to the generation of a cell line expressing a recombinant protein. In real life, these two workflows may overlap if, for example, a stable mammalian cell line is to be used as the source material from which to purify a recombinant protein.

The advantages, disadvantages and potential applications of a number of the commonly used recombinant expression systems are listed in Table 1. A number of publications provide detailed information on these systems: Escherichia coli [1-6] Saccharomyces cerevisiae [4, 7-10] Pichia pastoris [5, 8, 9, 11, 12] baculovirus / insect cells [1, 5, 13, 14] mammalian cell lines [5, 15] and cell-free / in vitro protein production systems [16]. Table 1 summarizes the fundamental properties of these expression systems. Researchers are actively working to improve these fundamental properties (see [17] for a review). Several E. coli strains are now commercially available that aim to overcome the problems of codon bias (Rossetta 2 [18], CodonPlus ril/rp), inefficient disulfide bond formation (SHuffle, Origami [19, 20] ) and poor expression of membrane proteins (C41 and C43). Similarly, E. coli expression vectors utilizing tags such as SUMO, maltose binding protein [21] and thioredoxin [21] designed to promote soluble expression are commercially available. Pre-expression of sulfhydryl oxidase may markedly promote disulfide bond formation [22]. These approaches work very well for some proteins, but for others, it is still not possible to obtain soluble, functional recombinant expression in E. coli [20, 23]. Proteins expressed in bacteria are contaminated with endotoxins, which can be removed by, for example, Toxineraser endotoxin removal kit from GeneScript [24] before being used in vivo. E. coli strains have even been engineered to perform protein N-glycosylation, though the efficiency is low [17, 25], or optimized for His-tagged protein expression (LOBSTR strain) [26, 27]. Likewise, approaches are being developed to express proteins with more mammalian-like glycosylation in the baculovirus/insect cell system and hence expand its utility [28]. Baculovirus variants that promote greater protein secretion are also being developed [17]. Staus DP et al minimized the number of cysteine residues in a truncated rat beta-arrestin 1 gene to increase its expression and stability in BL21/DE3 bacteria [29].

The critical problem facing the researcher is that it is still not possible to predict which expression system will work best for a particular protein and a specific end-use. A universally applicable expression system does not yet exist [30]. When selecting an expression system, the researcher should bear in mind the fundamental properties of each system, their pros and cons and how any particular limitations of that system can be overcome. Decisions should be informed by knowledge of the protein expression target/family members and the ultimate use of the recombinant protein. If resources permit, it may be prudent to explore two (or more) expression systems in parallel.

Expression system Advantages Disadvantages Applications Suppliers
Escherichia coli Potentially very high expression levels
Low cost
Simple culture conditions
Rapid growth
Simple transformation protocols
Many parameters can be altered to optimize expression
Inefficient disulfide bond formation
Poor folding of proteins in the cytoplasm (inc. bacterial proteins)
Inclusion body formation
In vitro refolding protocols inefficient – may negate advantages
Codon usage different to eukaryotes
Minimal post-translational modifications
May not be able to express large proteins
* engineered strains can help alleviate the problems with disulfide bond formation (Shuffle and Origami), codon bias (Rosetta and CodonPlus ril/rp), or protein secretion [31]
Purified protein (structure, enzymology, drug discovery)
Protein therapeutics
Invitrogen / Life Technologies
EMD Millipore
New England Biolabs
Saccharomyces cerevisiae Good expression levels
Choice of secreted or cellular expression
Low cost
Simple culture conditions
Able to perform most eukaryotic post-translational modifications
Efficient protein folding
Likely lower expression than with Pichia pastoris
Secretion likely lower than with Pichia pastoris
Glycosylation still different to mammalian cells
A tendency to hyperglycosylate proteins
N-glycan structures considered allergenic
Purified protein (structure, enzymology, drug discovery)
Protein therapeutics
Invitrogen/Life Technologies
Pichia pastoris High expression levels
Low cost
Simple culture conditions
Relatively rapid growth
Choice of secreted or intracellular expression
Protein secretion efficient and allows simple purification
Extensive post-translational modification of proteins
Efficient protein folding
N-glycosylation more like higher eukaryotes than with Saccharomyces cerevisiae
Use of methanol as inducer is a safety (fire) hazard at scale
Glycosylation still different to mammalian cells
Purified protein (structure, enzymology, drug discovery) Invitrogen/Life Technologies
Baculovirus-infected insect cells Good expression levels (esp. for intracellular proteins)
Relatively rapid growth
Efficient protein folding
Moderately scaleable
Extensive post-translational modification of proteins
Glycosylation more like mammalian cells

Interestingly, one structural genomics effort, the RIKEN Structural Genomics Initiative in Japan, has concentrated on the use of cell-free (in vitro transcription/translation) to generate proteins for its structure determination efforts [3, 36], which demonstrates the synthetic capacity of modern cell-free protein expression systems. However, cell-free systems tend to have low efficiency for proper folding.

Key aspects of protein expression for structural biology have recently been reviewed in a special edition of Current Opinion in Structural Biology in 2013.

This article has focussed on the most commonly used expression vector-host systems. These are the systems that are likely to be the first port of call when planning to express a recombinant protein. However, it should be noted that many other, more esoteric, expression systems are available. These may be of interest to those researchers with experience of protein expression, or in those situations where the more ‘mainstream’ expression systems do not meet the needs of a particular study. By way of example, the following microbial/plant cell host systems have been described: yeast (Hansenula polymorpha, Arxula adeninivorans, Kluyveromyces lactis, Yarrowia lipolytica, Schizosaccharomyces pombe [8] ) bacteria (Bacillus brevis, Bacillus megaterium, Bacillus subtilis and Caulobacter crescentis [2], Corynebacterium [37], hyperthermophilic sulfolobus islandicus [38] ) and algae [39]. For a recent review of alternative/less common expression systems used for structural biology see [40]. The non-pathogenic Mycobacterium smegmatis was used for the soluble expression of proteins from pathogenic Mycobacterium tuberculosis. This is of note as it is reported that the expression of mycobacterial proteins in E. coli can be problematic [41].

The following viral expression vectors are available for recombinant protein expression in mammalian cells: semliki forest virus [42] lentivirus [43] adenovirus [44] adeno-associated virus. Semliki forest virus has proved popular for the expression of membrane proteins for drug discovery and structural genomics [42]. Lentiviral and adenoviral vectors are currently of great interest in the field of gene therapy [45, 46]. There is also considerable interest in the expression of therapeutic recombinant proteins in the milk of transgenic animals [47]. Transposase-based systems are also gaining popularity for hyperactive constitutive expression, such as the sleeping beauty vector [48].

There has been a substantial recent interest in developing systems for the recombinant expression of multi-protein complexes for both structural biology and drug discovery/development [49-52].

The Worldwide Protein Data Bank (wwPDB: is an international collaboration of four organisations: RCSB PDB (USA: MSD-EBI (Europe: PDBj (Japan: and BMRB (USA). The wwPDB is a repository of macromolecular structural data whose "mission is to maintain a single PDB archive of macromolecular structural data that is freely and publicly available to the global community" [53, 54]. The vast majority of the structures in the wwPDB are of proteins.

To understand our current practice on protein expression, we select the PDB entries with publications within the last 10 years (from 2009 onwards), resulting 27096 articles, which correspond to 64713 records.

The majority of the wwPDB entries cited Escherichia coli as the expression host, with 23041 out of 27096 publications (85%) reporting its use. The common strain is BL21 (11628 articles).

Table 2 lists the top 5 expression hosts and the top 2 or 3 expression vectors for each of these host organisms.

Expression hostMost commonly used expression vectorsPublications
Escherichia coli 23041
pET28 and derivatives (Novagen/EMD Millipore) 2679
pET15 (Novagen/EMD Millipore) 863
pET21 (Novagen/EMD Millipore) 735
Spodoptera frugiperda 1493
pFastBac (Invitrogen/Life Tech) 228
pVL1392/3 (BD Bioscience) 27
pFB-LIC-Bse 16
Homo sapiens 1181
pHL-sec 92
pVRC8400 42
pTT5 17
Trichoplusia ni 532
pFastBac 76
pAcGP67 15
Cricetulus griseus (CHO) 245
pEE series 12

It is noteworthy that Escherichia coli is the most common expression host in the wwPDB dataset by a very substantial margin. Escherichia coli is a Gram-negative, rod-shaped bacterium. It is one of the key model organisms in life science research and has been extensively exploited in both academic and industrial settings. It should be noted that the lipopolysaccharides in the outer membrane are the source of endotoxin, which may elicit severe inflammatory responses in cellular and in vivo experimental models. Spodoptera frugiperda cells used for protein expression are cell lines (Sf 9 and Sf 21) derived from the ovarian tissues of the Fall Armyworm. Novavax expressed modified SARS-CoV-2 spike protein in Sf 9 cells to produce a COVID 19 vaccine [55]. Trichoplusia ni cells (available commercially as High Five) are derived from Cabbage Looper ovary cells. Trichoplusia ni is reported to be better than Sf9 cells for the production of secreted proteins using the baculovirus/insect cell system [56]. Trichoplusia ni cells are also superior to Sf9 cells as a host for the production of virus-like particles for recombinant vaccine production [57]. Pichia pastoris is respiratory, methylotrophic yeast that can utilize methanol as its sole carbon and energy source. For example, Kitchen P et al expressed full-length AQP4 protein in Pichia pastoris [58] and D Wrapp et al expressed bivalent VHHs in the pKai61 vector in Pichia pastoris [59]. Cricetulus griseus cell lines are derived from Chinese hamster ovary cells (CHO). This extensively used cell line can be adapted to suspension growth. Most recombinant antibodies are produced in CHO cell lines. Maun HR et al, for example, expressed human alpha- and beta-tryptase genes in CHO DKO cell line [60].

For each expression host, the top 2 or 3 most frequently cited expression vectors account for only a relatively small proportion of the total publications (Table 2). This reflects the wide range of different expression vectors that have been used for each expression host. Figures 1 and 2 show plasmid maps for pET28 and pcDNA3.3 (the latest version of pcDNA3), respectively. These plasmids are the archetypal expression vectors for E. coli and mammalian cells, respectively.

With pET vectors the T7 RNA polymerase promoter drives expression of the recombinant gene. The pET28 plasmid encodes an N-terminal His-tag/thrombin cleavage site/T7-tag sequence and an optional C-terminal His-Tag sequence. These vectors are used with lambda DE3 lysogen strains of E. coli. In these strains expression of a genomic copy of the T7 RNA polymerase is under control of the lac repressor. Expression of the recombinant protein is induced by the addition of isopropyl-b-D-thio-galactoside (IPTG) to the culture medium. Interestingly, vectors in which expression is induced by other molecules (e.g., arabinose) do not feature prominently in the wwPDB dataset.

With the pcDNA3 series of plasmids, the expression is driven by the immediate early promoter of human cytomegalovirus (CMV). This is a strong promoter, constitutively active in mammalian cells. pcDNA3 is the original version of the pcDNA3 series of vectors and is no longer commercially available. The latest development of this series of vectors is pcDNA3.3. It should be noted that pcDNA3 was also identified as one of the most commonly used mammalian expression vectors in a survey of formal publications (like pcDNA3.1 [61, 62] or pcDNA3.3 from Invitrogen (K8300-01) [63] ).

The most commonly used baculovirus/insect cell vectors all utilize the strong polyhedrin promoter to drive constitutive expression of the recombinant protein. The plasmids (baculovirus transfer vectors) do not themselves directly drive protein expression. They are used to generate recombinant baculovirus containing the gene of interest under the control of the polyhedrin promoter. pFastBac is a newer generation of plasmids that utilize site-specific transposition to generate recombinant baculovirus. This reduces the time taken to generate recombinant baculovirus to around 2 weeks compared to the 4-6 weeks required with older generation plasmids such as pVL1392/3 and pAcGP67.

The Pichia pastoris vectors both utilize the strong AOX1 promoter. The expression is induced by methanol. Despite Pichia pastoris being an excellent system for the production of secreted proteins (see below), the two most commonly used Pichia pastoris expression vectors in the wwPDB dataset are both designed for cytoplasmic expression.

The data in Table 2 correspond to expression hosts and expression vectors that were specifically employed for producing proteins for structural studies. Consequently, this dataset is biased towards those expression vectors/hosts that are capable of generating large amounts of purified proteins that are required for 3-dimensional structure determination studies. The inability of Escherichia coli to glycosylate proteins and the relative ease with which insect cell N-glycosylation can be removed enzymatically (see Table 1) may also contribute to the frequent use of these systems in the wwPDB dataset. Unglycosylated proteins are generally preferred for structural studies, unless the sugar molecules are essential for function.

For other applications (such as enzyme assays, cellular assays, production of antigen for antibody generation, over-expression to study cellular function or localization) it may not be necessary to express and purify such substantial quantities of protein. Indeed, for cell-based studies purification of the recombinantly-expressed protein is unlikely to be a consideration. Nonetheless, the data on expression vectors/hosts obtained from the wwPDB is a precious resource. The data can help guide protein expression projects for many applications.

To avoid the potential bias of the wwPDB dataset, Labome surveyed a randomly selected set of formal publications that cited plasmids. The top 3 most commonly used groups of expression vectors are shown in Table 3. As observed for the wwPDB dataset, the most commonly cited expression vectors account for only a small proportion of the total publications. Again, this reflects the diversity of expression vectors that researchers use for any given expression host. Vector pcDNA3.1 drives expression in mammalian cells via the constitutive CMV promoter (see above). Plasmid pGL3 is a luciferase reporter vector designed for the quantitative study of the regulation of mammalian gene expression. The increasing popularity of this vector presumably reflects a growing desire of researchers to study the function of the human genome at both the transcriptional and proteomic levels.

Similarly, pEGFP is a mammalian expression vector in which expression is driven constitutively by the CMV promoter. Enhanced green fluorescent protein (EGFP) is expressed as either an N-terminal (pEGFP-C1) or a C-terminal (pEGFP-N1) fusion with the protein of interest. These pEGFP vectors may be used to study the subcellular localization or trafficking of proteins by monitoring the EGFP fluorescence [64, 65]. Other vectors such as pRK5 expression vector [66] are also utilized.

VectorPMIDHostCommon variant Reference
pcDNA3.1 (Invitrogen) 83 Mammalian cell lines pcDNA 3.1 His
pcDNA 3.1 V5
[67, 68]
pGL3 (Promega) 44 Mammalian cell lines [68]
pEGFP (Clontech) 40 Mammalian cell lines pEGFP-C1

It is notable that the most commonly used mammalian expression vectors, both in the publication survey and in the wwPDB dataset, drive constitutive expression. This is despite the commercial availability of inducible mammalian expression vectors such as the T-Rex system, like Thermo Fisher Flp-In T-REx 293 cells used in the production of ApoE3 proteins [70] or other proteins [67], pGEX vectors from GE Healthcare [71], or the pF12 RM Flexi system. Although not featuring highly in academic publications, the BacMam system has proven popular in the pharmaceutical industry for expressing proteins both for cellular studies and for purification [72-74] and is now commercially available. BacMam utilizes a modified baculovirus in which the usual promoter is replaced with the mammalian cell-active CMV promoter. The BacMam virus drives non-replicative, non-lytic expression in a wide range of mammalian cell types.

Unlike in research settings, the production of newly approved protein pharmaceuticals is often in mammalian expression systems. Gary Walsh summarized the expression systems used in the production of protein pharmaceuticals approved by US/EU regulatory authorities from Jan 2014 to Dec 2018 [79]. Fifty-two out of 62 novel recombinant protein pharmaceuticals are expressed in mammalian cell lines, one (Sebelipase alfa) in a mammalian transgenic system, 5 in E. coli, and 4 in S. cerevisiae. CHO cell-based systems are the most common mammalian expression host. Among the 68 monoclonal antibody drugs (novel or biosimilar) approved during the same period, 57 are produced in CHO cell lines, 9 in NS0 cells and 2 in Sp2/0 cells.

A frequently encountered problem is the expression of recombinant proteins that are toxic to the host cells in which they are expressed. Several strategies are available to overcome this issue. The researcher generally has to empirically determine which potential solution works best for their particular protein. Literature precedent, target class knowledge and ‘in-house’ experience of the target protein can all be used to guide the choice of strategy.

  • Tightly regulated (i.e., non-leaky) expression systems such as the pBAD system utilizing the araBAD promoter (Invitrogen/Life Sciences) can be used to minimize basal expression [80].
  • With T7 promoter-based plasmids, pre-induction expression can be reduced by the use of pLysS/pLysE/pLysY host cells expressing T7 lysozyme that inhibits T7 RNA polymerase and thus reduces promoter activity prior to IPTG induction. Similarly, glucose can be used to repress promoter activity prior to induction with IPTG [80].
  • The pETcocoTM system (EMD Millipore) allows plasmid copy number to be maintained at a very low level during cell growth thus minimizing basal expression and maximizing plasmid stability prior to induction. Plasmid copy number is markedly upregulated and target gene expression induced by IPTG [80].
  • Another approach is to use host strains, such as C41(DE3) and C43(DE3), empirically selected for their ability to express toxic proteins more effectively than the parental BL21(DE3) strain (Avidis, Lucigen) [80].
  • Empirically screening fusion tags such as maltose binding protein, GST, thioredoxin or SUMO may identify a fusion partner that overcomes the toxicity of the target protein (Invitrogen/Life Sciences, New England Biolabs, LifeSensors) [80].
  • Directing expression to the periplasm can potentially overcome toxicity associated with cytosolic accumulation [80].
  • Batch-fed culture may also be a useful approach to the expression of toxic proteins [81].

Many mammalian expression systems use a constitutively active CMV promoter. This is problematic for the expression of proteins that are toxic to the host cells. However, researchers frequently wish to study the cellular function of such proteins or wish to express and purify such proteins bearing full mammalian cell post-translational modifications. Multiple inducible mammalian expression systems, utilizing the strong CMV promoter, are now commercially available in which expression is induced by tetracycline (T-RexTM Invitrogen/Life Sciences, Tet-On 3G Clontech), ecdysone (Agilent Technologies/Stratagene) and IPTG (pTUNE Origene). These systems facilitate the growth of sufficient cell numbers prior to the induction of the target protein. Nakagawa T expressed a DualTetONGluA2-FLAG/CNIH3-1D4 plasmid in HEKTetON cell (CLONTECH) to alleviate the toxicity by the activation of the ion channel GluA2 [82].

If one particular expression system fails, it may be advantageous to switch to a different system (e.g., yeast, insect, bacterial, mammalian) if other considerations allow (e.g., end use, post-translational modifications). A protein that is toxic in one system may not be toxic in another [80].

If relatively small quantities of (purified) protein are required then cell-free protein production is an attractive option to circumvent issues of cellular toxicity [80].