We are searching data for your request:
Upon completion, a link will appear to access the found materials.
Like S. cerevisiae, S. pombe is a model organism with its own large community of researchers. Pombase serves as the central database for information on S. pombe, functioning much like the SGD. Access Pombase at http://www.pombase.org
- Enter the name for the S. pombe homolog that you obtained in Homologene.
- Record the systematic name for your gene, which refer to the position of the gene on the chromosome. Systematic names begin with SPAC, SPBC or SPCC, corresponding to chromo- somes I, II, and III, respectively.
- Pombase stores all the information on a single page that is divided into fields. Individual fields can be expanded or minimized with an arrow by the field name. Quick links on the lower right also help with navigation.
- Navigate to the Transcipt field. The graphic will indicate whether the homolog contains an intron. The field also contains information about the exon/intron boundaries and informa- tion on 5’- and 3’-UTRs in mRNAs, when that information is available. Does the homolog to your MET gene contain introns?
- Spend a little time seeing what kind of information (e.g. gene/protein size, function, protein interactions) is available for your S. pombe ortholog. You may find Pombase to be a helpful resource when writing lab reports.
DNA replication through hard-to-replicate sites, including both highly transcribed RNA Pol II and Pol III genes, requires the S. pombe Pfh1 helicase
Replication forks encounter impediments as they move through the genome, including natural barriers due to stable protein complexes and highly transcribed genes. Unlike lesions generated by exogenous damage, natural barriers are encountered in every S phase. Like humans, Schizosaccharomyces pombe encodes a single Pif1 family DNA helicase, Pfh1. Here, we show that Pfh1 is required for efficient fork movement in the ribosomal DNA, the mating type locus, tRNA, 5S ribosomal RNA genes, and genes that are highly transcribed by RNA polymerase II. In addition, converged replication forks accumulated at all of these sites in the absence of Pfh1. The effects of Pfh1 on DNA replication are likely direct, as it had high binding to sites whose replication was impaired in its absence. Replication in the absence of Pfh1 resulted in DNA damage specifically at those sites that bound high levels of Pfh1 in wild-type cells and whose replication was slowed in its absence. Cells depleted of Pfh1 were inviable if they also lacked the human TIMELESS homolog Swi1, a replisome component that stabilizes stalled forks. Thus, Pfh1 promotes DNA replication and separation of converged replication forks and suppresses DNA damage at hard-to-replicate sites.
DNA replication is a fundamental process that must occur with high efficiency and precision. Schizosaccharomyces pombe is an excellent model for replication studies because its genome organization is very similar to higher eukaryotes. In eukaryotes, replication initiates at multiple origins along the chromosome, with replication proceeding bidirectionally from these origins. However, a few chromosomal regions are replicated unidirectionally due to naturally occurring replication fork barriers (RFBs) that pause or stall replication forks moving in one direction through the site. The ribosomal DNA (rDNA) and the mating type locus are two S. pombe genomic locations that replicate unidirectionally (Sanchez et al. 1998 Dalgaard and Klar 2000 Krings and Bastia 2004). Both regions contain cis-acting sequences that are maintained by the nonnucleosomal Swi1/Swi3 (human TIMELESS/TIPIN) complex and function as natural RFBs (Dalgaard and Klar 2000 Krings and Bastia 2004). The S. pombe Swi1/Swi3 complex migrates with the replisome, and the human TIMELESS/TIPIN homologs interact with replisome components (Noguchi et al. 2004 Gotter et al. 2007).
S. pombe rDNA, which is located in two clusters, with one at each end of chromosome III, consists of 100–150 copies of 10.9-kb rDNA repeats. The RFBs in each rDNA repeat pause replication forks moving in the direction opposite of transcription, ensuring that transcription and replication move in the same direction through these highly transcribed genes (Sanchez et al. 1998). Sap1 and Reb1 are two additional rDNA RFB-binding proteins (Sanchez-Gorostiaga et al. 2004 Krings and Bastia 2005 Mejia-Ramirez et al. 2005). There are four RFBs in each rDNA repeat: Ter1, Ter2, Ter3, and RFB4. Reb1 binds Ter2–3, and Sap1 binds Ter1. sap1 + , but not reb1 + , is required for cell viability (Arcangioli et al. 1994). The pausing at Ter1–3 is abolished in swi1 + or swi3 + mutant cells (Krings and Bastia 2004), but the requirements for fork pausing at RFB4 have not been characterized. Like Xenopus laevis and human cells (Little et al. 1993 Wiesendanger et al. 1994), the barrier activity at rDNA in S. pombe (Sanchez et al. 1998) is not as efficient as in Saccharomyces cerevisiae (Brewer and Fangman 1988 Linskens and Huberman 1988). In addition, similar to human cells, S. pombe 5S ribosomal RNA (rRNA) genes are not located in the rDNA cluster but are dispersed throughout all three chromosomes (Wood et al. 2002).
The mat1 locus, located on chromosome II, encodes the mating type of the cell. At this locus, replication progresses from the telomeric side of mat1 toward the centromere. Replication from the opposite direction is blocked by the Swi1, Swi3, Rtf1, and Rtf2 complex bound to the replication termination sequence (RTS1), which allows proper imprinting during mating type switching (Dalgaard and Klar 2001). In S. cerevisiae, replication forks also show modest fork pausing at tRNA genes (Deshpande and Newlon 1996) and highly transcribed RNA polymerase II (Pol II) genes (Azvolinsky et al. 2009). All of these replication pause sites—the rDNA RFB, the 5S rRNA genes, the mating type RTS1, tRNA genes, and highly transcribed Pol II genes—are associated with stable protein complexes.
Pif1 enzymes comprise a 5′–3′ DNA helicase family that is found in essentially all eukaryotes and some prokaryotes (for review, see Bochman et al. 2010, 2011). These helicases perform major roles in both nuclear and mitochondrial DNA maintenance. S. cerevisiae and several other fungi encode two Pif1 members. In contrast, S. pombe, like humans and most other metazoans, encodes a single Pif1 family helicase, Pfh1. So far, most of the information on Pif1 family helicases comes from studies on the two S. cerevisiae Pif1 proteins, ScPif1 and ScRrm3. Although ScPif1 and ScRrm3 affect replication of many of the same substrates, their effects on these substrates are different and sometimes even opposing. For example, at the rDNA, ScPif1 impedes, while ScRrm3 promotes, fork progression through the RFB (Ivessa et al. 2000). Likewise, ScPif1, but not ScRrm3, is important for maintenance of mitochondrial DNA (Foury and Kolodynski 1983 O'Rourke et al. 2005 Cheng et al. 2007, 2009). However, deletion of RRM3 in pif1Δ cells partially suppresses the high loss rate of mitochondrial DNA that occurs in the absence of ScPif1 (O'Rourke et al. 2005). ScRrm3 promotes semiconservative replication at telomeres, while ScPif1 inhibits telomerase (Zhou et al. 2000 Ivessa et al. 2002). ScRrm3 also promotes fork progression at other sites where ScPif1 is not known to act, such as at tRNA genes, inactive origins, and the silent mating type loci, while ScPif1 is important for replication through G-rich motifs that can form G-quadruplex structures in vitro (Ivessa et al. 2003 Paeschke et al. 2011). Human Pif1 (hPif1) also unwinds G-quadruplex structures in vitro, as well as DNA structures that resemble stalled replication forks (George et al. 2009 Sanders 2010). Although S. cerevisiae Pif1 family helicases have important roles in DNA replication, pif1Δ rrm3Δ cells are viable (Ivessa et al. 2000). In contrast, S. pombe pfhl + is essential in both mitochondria and nuclei because of its key function in replication of both mitochondrial and nuclear DNA (Tanaka et al. 2002 Zhou et al. 2002 Pinter et al. 2008). Thus, an outstanding question in the field is how a single Pif1 family helicase performs roles that have been separated and often function in opposition to each other in S. cerevisiae. This question is compelling given the recent finding that several families with a high incidence of breast cancer carry hPIF1 mutations in a highly conserved residue (Chisholm et al. 2012). Moreover, the same mutation in Pfh1 results in a stable but inactive protein that is unable to provide either its nuclear or mitochondrial functions (Chisholm et al. 2012).
Although Pfh1 is required for chromosome replication (Pinter et al. 2008), its mechanism of action is unknown. Here, we tested the hypothesis that Pfh1 promotes chromosomal DNA replication at specific hard-to-replicate sites, such as RFBs and highly transcribed RNA Pol II and Pol III genes. We show that replication at all of these sites was impaired in Pfh1-depleted cells. We provide evidence that it is the stable protein complexes at the RFBs within the mating type locus and rDNA that made replication Pfh1-sensitive. Moreover, in the absence of Pfh1, genome integrity was compromised at those sites where its activity was needed for efficient replication, as DNA damage increased at these sites in Pfh1-depleted cells but not at sites that did not show Pfh1-sensitive DNA replication. These data support the hypothesis that Pfh1 has a ScRrm3-like role in promoting fork movement past stable protein complexes, thereby facilitating semiconservative DNA replication. However, Pfh1 is so far unique among eukaryotic DNA helicases in being required for efficient replication fork progression through highly transcribed RNA Pol II genes.
Whole-genome sequencing has enabled investigations into the gene content of living many organisms and forms the foundation for further study of gene expression, proteomics and epigenetics. After assembly of a novel genome, gene annotation is often the first step in analysing the gene content of an organism. Accurate annotation of the exonic structure of genes is crucial to the success of all subsequent functional and comparative analyses.
Problems that can potentially be caused by incorrect gene annotation are numerous and can lead to incorrect assessments of the lifestyle and ecology of an organism. In comparative genomics where orthologous genes or conserved functional domains are compared between species/isolates, the estimated numbers of such genes/domains can be distorted by less than perfect annotations (as described by Hane et al. , S Text 1). Prediction of extracellular secretion, which can be determined by a short signal peptide at the N-terminus, can miss secreted proteins if the start codon of a gene has been incorrectly annotated. Mis-annotating the start of protein translation could either cut off the signal peptide or bury it within the annotation. While a seemingly benign annotation error, the consequences for downstream research could be detrimental, particularly as the biotic interactions or industrial applications of microbes are largely determined by their secretomes. Additionally, translated protein sequences of novel species are often submitted to databases such as NCBI  and Uniprot . It is commonplace to use these database entries to support the annotation of related species or isolates, meaning errors present in the pioneer annotation may be repeated. When these new annotations based on false assumptions are added to databases, there is not only a propagation of errors, but also a perceived strengthening of homology evidence for incorrect protein sequences.
In recent years, correction of in silico predicted gene annotations with RNA-seq derived transcripts and read alignments has enabled vastly improved genome annotations and corrections of annotated gene structures [4-6]. Short read and/or assembled transcript alignments are typically used to correct the coordinates of intron-exon boundaries in existing gene annotations or predictions , to train gene predictors , and can also be incorporated directly into gene prediction by hybrid gene predictors [9,10]. Since their initial application to gene prediction , generalised hidden Markov models (GHMMs) have played an important role in genome annotation. Various GHMM gene predictors [12-15] continue to be incorporated into annotation pipelines [16-18], some of which are capable of making use of RNA-seq data. For example, AUGUSTUS [9,10,14] allows the user to generate hint files from RNA-seq read/transcript alignments that are then used to improve prediction accuracy. More recently, a new version of GeneMark-ES , named GeneMark-ET  allows the incorporation of RNA-seq data into its automated gene model training. These gene finders are both applicable to a broad range of eukaryotic genomes. A number of pipelines have also been developed that utilise available gene prediction software and RNA-seq data to generate annotations. Some examples of such pipelines are Maker [16,19], EVidenceModeler , JAMg , SnowyOwl  and the insect genome annotation pipeline OMIGA . The continued development of pipelines such as these relies on the availability and development of component software such as GHMM gene predictors.
Fungal genomics has applications in areas such as agriculture [22-24], medicine [25,26], biomass conversion [27,28] and food/beverage production [29,30]. This broad industry relevance and the continued growth in the number of new fungal species with sequenced genomes emphasises the importance of fungal gene annotation. Fungal genomes differ from those higher eukaryotes in that they are gene dense with short introns [31,32]. They also exhibit less alternate splicing when compared to other eukaryotes, with a higher proportion of mRNA isoforms arising from retained introns . Manual annotation is considered to be the most reliable method of producing a high quality genome annotation, but this is time consuming and can be a bottleneck in genome studies . Consequently fungal genome annotations are typically derived from ab initio predictions, spliced EST/transcript alignments and protein homology . For many fungi, closely related species have either not been sequenced or their genomes have not been annotated in detail. This can mean that sets of homologous proteins for use in protein homology annotation are either small or unreliable. In such cases, gene prediction relies more on EST/transcript alignments and ab initio predictions.
Currently available gene prediction software and pipelines are typically intended for application across a broad range of eukaryotes, with comparatively few being specific to fungi. GipsyGene  is a GHMM gene predictor that was developed for fungi, with particular attention given to modelling fungal introns correctly. A version of GeneMark-ES , a self-training GHMM, also uses an intron model designed for fungi. However, neither of these incorporates RNA-seq data. SnowyOwl  is a recently developed pipeline designed specifically to annotate fungal genomes using RNA-seq data and homology information. Although designed for fungi, SnowyOwl selects from GHMM predictions made by AUGUSTUS [9,10,14], a gene predictor that was optimised for application across a broad range of eukaryotes.
In this study we present the gene prediction tool CodingQuarry. It is designed to make protein-coding gene sequence predictions through the use of assembled or aligned RNA-seq transcripts in both GHMM training and prediction. CodingQuarry is differentiated from other gene predictors by the combined use of gene predictions made directly from both transcript and genome sequences.
The choice to tailor CodingQuarry to the prediction of fungal genes and to use assembled, aligned transcripts rather than raw read alignments relates to some key differences between fungal genomes and those of higher eukaryotes. Firstly, fungi exhibit significantly less alternative splicing than higher eukaryotes. Consequently, the task of transcript assembly is simpler, resulting in a higher proportion of correctly assembled full-length transcripts . Secondly, fungi have smaller introns than higher eukaryotes . Recent studies indicate short introns are reconstructed in transcript assembly with a higher success rate than long introns . These transcript assembly advantages make it feasible to generate coding sequence annotations directly from assembled transcript sequences, a process that is more likely to be error prone in higher eukaryotes.
A major consequence of the high gene density observed in fungi is a high proportion of instances whereby the untranslated regions (UTRs) of adjacent transcripts overlap in terms of their positions on genomic DNA. Overlap can be between 3′ and 5′ UTRs of adjacent genes on the same strand, or between 5′ and 5′ or 3′ and 3′ UTRs of adjacent genes on opposite strands. Overlaps from the latter example, particularly in the case of 3′ to 3′, are referred to as sense-antisense (S-AS) overlaps. S-AS overlaps have been observed to occur rarely in many species, but are widespread in fungi [38,39]. Essentially this means that in gene-dense fungal genomes, mapped RNA-seq reads belonging to adjacent genes may support regions of coverage that span two or more loci. This is a more severe problem when ‘unstranded’ RNA-seq chemistries are used, as S-AS overlaps can be distinguished through the use of stranded RNA-seq data. CodingQuarry is designed to work with assembled, aligned transcripts derived from either stranded or unstranded RNA-seq data and to specifically address the problem of merged transcripts, such that these transcript assembly errors do not translate to coding sequence annotation errors or omitted gene loci.
For the purpose of demonstrating CodingQuarry’s performance we have selected two exemplar fungal species, which possess highly reliable sequence and annotation resources: Saccharomyces cerevisiae and Schizosaccharomyces pombe. S. cerevisiae, commonly known as Baker’s yeast, has long been a model organism and is important to the wine making, baking and brewing industries. Sc. pombe, commonly known as fission yeast, is also a model organism. These two species are estimated to have diverged from a common ancestor up to 1000 million years ago [40,41] and are representative of distantly related fungal sub-phyla. In this study we have used the high-quality annotations of these fungi to benchmark the sensitivity and specificity of CodingQuarry, and compare it to other gene predictors.
9.10: Investigate S. pombe homologs using Pombase - Biology
Fig. 1. Experimental design and identified phosphopeptides. (A) The workflow of our phosphopeptide-profiling experiments. (B) Two-way overlap between phosphorylated proteins identified in our MMS-induced (WT-MMS), Cds1-dependent (Cds1-MMS), and Rad3-dependent (Rad3-MMS) datasets. (C) Three-way overlap between phosphorylated proteins identified in our MMS-induced (WT-MMS), Cds1-dependent (Cds1-MMS), and Rad3-dependent (Rad3-MMS) datasets and our MMS-induced (WT-MMS), HU-induced (WT-HU), and IR-induced (WT-IR) datasets.
PNAS | Published online June 13, 2016 | E3677
(Dataset S1, Tables S1–S3). The majority (65%) of Rad3-dependent phosphoproteins are also Cds1 dependent (Fig. 1B), consistent with the idea that most Rad3 signaling during S phase goes through Cds1. Moreover, only 7% (18/243) of the unique Rad3dependent twofold-enriched phosphopeptides were phosphorylated on SQ or TQ, the Rad3 target motif. Although there is significant overlap between the Cds1- and Rad3-dependent targets, a number of phosphorylation events were present in either the Cds1-MMS dataset or the Rad3-MMS dataset but not both datasets. Many of these proteins may be phosphorylated in both a Cds1- and Rad3-dependent manner but are missing from one of the datasets because of experimental variability in detection by mass spectrometry. In particular, the presence of targets that appear in the Cds1-MMS dataset but not the Rad3-MMS dataset may result from the smaller size of the latter. However, some of the Rad3-dependent phosphorylations that are still phosphorylated in cds1Δ cells may be targets of Chk1, which is activated by S-phase DNA damage in cds1Δ cells (13, 25) and phosphorylates some of the same substrates as Cds1 (26). Nonetheless, because Chk1 is not activated during the S phase in wild-type cells (13, 25), these substrates are unlikely to be physiological targets of Chk1. To explore further the possible direct targets of Rad3, we employed immuno-enrichment of phospho-SQ (pSQ)–containing peptides, a technique which should identify less abundant peptides (15). In one experiment, we compared pSQ-enriched peptides in MMS-treated and -untreated S-phase cells, producing the dataset WT-pSQ. We then compared pSQ-enriched peptides in MMStreated S-phase wild-type and rad3Δ cells, producing the dataset Rad3-pSQ (Fig. 1A and Fig. S1). We recovered many fewer phosphopeptides in these experiments: 142 and 187, representing 68 and 77 unique sequences in 62 and 72 proteins, respectively, of which 27 and 12 were enriched at least twofold (Dataset S1, Tables S1–S3). We reasoned that the DDR-dependent DNA damage-induced phosphopeptides we identified in S-phase cells would comprise both S-phase damage-specific targets and more general DDR targets. To discriminate between the two, we identified targets of S-phase DDR signaling induced by HU, which triggers cell-cycle arrest by preventing dNTP synthesis and thus arresting DNA polymerases without causing DNA damage per se, and G2 DDR signaling induced by IR, which causes a DNA damage-induced G2 arrest. From the comparison between HU-treated and -untreated cells (dataset WT-HU), we identified 9,057 phosphopeptides representing 3,705 unique sequences in 1,905 proteins, 209 of which were enriched at least twofold in response to HU treatment. From the comparison between irradiated and unirradiated cells (dataset WT-IR), we identified 7,961 phosphopeptides representing 3,266 unique sequences in 1,797 proteins, 122 of which were enriched at least twofold by IR treatment. Many of the HU- and IR-induced peptides overlap with those induced by MMS, although the overlap is notably less than the overlap among the three MMS-treated datasets (Fig. 1C). In total, we identified 33,973 unique phosphopeptides, 2,068 of which, in 726 proteins, were enriched at least twofold in one or more datasets. A summary of the seven datasets is presented in Dataset S1, Table S1. The complete S-phase DDR phosphoproteomic database is provided in Dataset S1, Table S2 and Dataset S2. Proteins with phosphopeptides that are enriched at least twofold in any dataset are listed in Dataset S1, Table S3. Assessment of the Quality of the Phosphoproteomics Datasets. We
assessed the quality of our phosphoproteomic database using several comparative metrics. First, we examined the reproducibility of our data using an internal control. As a result of our experimental strategy—which involves differentially mass-labeling and mixing control and experimental samples, identifying peptides in the mass spectrometer, and then looking for their differentially mass-labeled E3678 | www.pnas.org/cgi/doi/10.1073/pnas.1525620113
cognate—we often independently isolate a heavy peptide and compare it with its light cognate in one MS cycle and then independently isolate the light version and compare it with its heavy cognate in a subsequent cycle. A comparison of these two measurements reveals how reproducible our measurements are and how much noise we introduce during the LC-MS/MS procedure. Across all datasets, the median difference in calculated enrichment ratios between independent phosphopeptide identifications is 0.0051, demonstrating that our enrichment estimates are highly reproducible (Fig. 2A). As another indication of specificity, we compared the enrichment of phosphorylated RxxS sequences, a known consensus target of Cds1 (27), in our datasets. The WT-MMS and Cds1-MMS datasets both show significant enrichment of RxxpS in their
Fig. 2. Quality assessment of phosphopeptide datasets. (A) The distribution of differences in Vista ratios (the ratio of peptide abundances in the control and experimental samples) when two nominally identical measurements were compared: the Vista ratio when the light-labeled peptide was identified first vs. the Vista ratio when the heavy-labeled peptide was identified first. The mean for the distribution is 0.043, and the median is 0.0051. (B) Enrichment across all datasets of four well-characterized phosphoproteins: Mcr1, Cdc25, Cdc2, and histone 2A. A diagram of the checkpoint kinase circuit is shown below the graph.
DNA damage and to explore the specificity of Cds1- and Rad3dependent phosphorylations, we analyzed the sequence context of our up-regulated phosphorylations. We used the Motif-X program to identify enriched sequence motifs in our dataset, using the S. pombe proteome as the background (33). In our WT-MMS dataset we find three significantly enriched phosphorylation motifs in our twofold up-regulated phosphopeptides: RxxpS, pSP, and pSD (Fig. 3). The RxxpS-phosphorylated phosphopeptides are presumed to be direct targets of Cds1 (27). SP is the known recognition target of both the CDK and MAP family of kinases SD is phosphorylated by kinases of the CK2 family. Although we have no direct evidence of their involvement in MMS-induced S-phase phosphorylation, the Sty1 MAP and Cka1 CK2 kinase are both involved in cell-cycle and cell-growth control and could plausibly be responsible for S-phase DNA damage-induced phosphorylations. To determine if the non-Cds1 phosphorylations we observe are caused by DDR-independent responses to MMS or to other kinases activated by Cds1, we examined the sequence context of the phosphorylations up-regulated twofold in our Cds1-MMS and Rad3-MMS datasets. In both datasets, we saw the same range of motifs and extensive substrate overlap with the WT-MMS dataset Willis et al.
Fig. 3. Phosphopeptide motif enrichment. Enriched phosphosite motifs identified by the Motif-X algorithm (33, 34). The residues in gray are the fixed positions defining the motif. At other positions, enriched and depleted residues are shown above and below the midline, respectively. The red lines indicate statistical significance at the 0.05 level. The Cds-1–independent phosphopeptides are those found in the WT-MMS dataset but not in the Cds1- or Rad3-MMS datasets. The mitotic target phosphopeptides are those found in the WT-MMS dataset but not in the Cds1- or Rad3-MMS datasets and that have mitotic GO annotations.
(Figs. 1 and 3). These results suggest that the majority of S-phase phosphorylations induced by DNA damage are caused, directly or indirectly, by the activation of Rad3 and Cds1. However, if we look specifically at the peptides phosphorylated in a DDR-independent manner (those enriched at least twofold in the WT-MMS dataset but not in the Cds1-MMS or Rad3MMS datasets), we find they are not enriched for the RxxpS motif (P = 0.46) but instead are enriched for the pSP motif, consistent with checkpoint-independent CDK or MAP kinase phosphorylation (Fig. 3). Because the S-phase DDR is dependent on Rad3, we were surprised to find no enrichment in pSQ phosphorylations in the phosphopeptides up-regulated twofold in either the WTMMS or Rad3-MMS dataset (P = 0.15 and 0.09, respectively, Fisher’s exact test). This result suggests that most of the Rad3dependent phosphorylations are regulated indirectly through Cds1 and other downstream kinases. Alternatively, Rad3 SQ substrates may tend to be less abundant than Cds1 substrates. However, pSQ is the motif identified in the pSQ datasets (Fig. 3), indicating that additional direct Rad3 substrates might be found were the proteome to be sampled more deeply. Importantly, we observe a significant enrichment for proline preceding the pSQ site, with 11/37 (P = 8.7 × 10−8, Fisher’s exact test) pSQ peptides in the WT-pSQ dataset and 4/11 (P = 5.0 × 10−4) in the Rad3-pSQ dataset being PpSQ motifs. As far as we are aware, PpSQ has not been seen previously for other PIKK kinase extended-recognition motifs, but the fact that it was seen in a dataset derived from antibody enrichment leaves open the possibility that the antibodies used favor prolines in that position rather than the kinases themselves. PNAS | Published online June 13, 2016 | E3679
Analysis of Damage-Induced and Checkpoint-Dependent Phosphorylation Sites. To investigate which kinases may be activated by S-phase
fourfold-enriched phosphopeptides (P = 2.6 × 10−2 and 1.3 × 10−5, respectively, Fisher’s exact test). Likewise, SQ-containing phosphopeptides are highly enriched in the pSQ affinity datasets. For WT-pSQ, 114/142 (80%) of phosphopeptides contain an SQ or TQ for Rad3-pSQ the ratio is 167/187 (89%). Another indication of the quality of our datasets is the inclusion of known S-phase DDR phosphorylation targets. Three of the best-studied phospho-targets downstream of Cds1 are the Mrc1 mediator, the Cdc25 phosphatase, and its target, the Cdc2 cyclin-dependent kinase, DDR regulation of which prevents cell division during activation of the checkpoint (22, 26, 28–30). The WT-MMS, Cds1-MMS, Rad3-MMS, WT-HU, and WT-IR datasets all contain at least twofold-enriched phosphopeptides for Mrc1, Cdc25, and/or Cdc2 (Fig. 2B). The low level of Mrc1 in the WT-IR dataset is expected because Mrc1 phosphorylation is specific to S phase. In addition, the low level of Cdc2 phosphorylation in the WT-MMS and Cds1-MMS datasets is expected because Cdc2 is normally phosphorylated when complexed with Cdc13 in S phase and G2 checkpoint activation simply maintains that phosphorylation. Cdc2 phosphorylation is increased in the WT-HU and WT-IR datasets because those cells were checkpoint arrested, allowing Cdc13 to accumulate and thus producing more Cdc2–Cdc13 to be phosphorylated. Another known S-phase DDR target, the γ-H2A phosphopeptide, a wellstudied direct Rad3 target (16, 31), is enriched at least eightfold in the WT-MMS, Rad3-MMS, and WT-IR datasets (Fig. 2B). Consistent with this phosphorylation being Rad3- but not Cds1dependent and with γ-H2A not being phosphorylated in response to HU above normal S-phase levels, the phosphopeptide is not enriched in the Cds1-MMS or WT-HU datasets (32). All in all, we find the expected pattern of phosphorylation in the four proteins in 18 of 20 cases across the five datasets. The two exceptions are the low Cdc25 phosphorylation and high Cdc2 phosphorylation in the Rad3-MMS dataset. We ascribe these exceptions to experimental variability, possibly resulting from the smaller size of the Rad3-MMS dataset. Finally, we examined the overlap between damage-induced and DDR kinase-dependent S-phase phosphopeptides, which should be extensive. As expected, in each two-way comparison a majority of the smaller dataset overlaps with the larger dataset: 209/298 (70%) for WT-MMS vs. Cds1-MMS, 103/200 (52%) for WT-MMS vs. Rad3-MMS, and 130/200 (65%) for Cds1-MMS vs. Rad3-MMS (Fig. 1B).
DDR-Dependent Phosphorylations Span a Diverse Spectrum of Biological Functions. The phosphoproteins identified in our datasets span a
that fall into several broad categories—gene expression, cytoskeleton and cytokinesis, signal transduction, stress response, cell cycle, and DNA repair—with the bulk of the phosphorylated protein being associated with gene expression or cytoskeleton and cytokinesis (Fig. 4A and Dataset S1, Table S4). Phosphorylation by the S-phase DDR of proteins involved in gene expression is pervasive. We find four major classes of gene expression-related proteins enriched in our S-phase DNA damage
broad range of biological functions. To investigate which biological processes are preferentially targeted by S-phase DNA damage- and S-phase checkpoint kinase-dependent phosphorylation, we calculated which gene ontology (GO) categories are enriched in each of our datasets (35). Our S-phase DNA damage datasets (WT-MMS, Cds1-MMS, and Rad3-MMS) are enriched in annotated GO terms