Why do I get cytosine to guanine/adenine transitions in bisulphite treated sequences?

Why do I get cytosine to guanine/adenine transitions in bisulphite treated sequences?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I got my sequencing results (bisulphite treated and non treated sequences of same species Allium cepa) and now I have to do analysis in Cymate online tool. I prepared all sequences as it is written in original Cymate paper (master sequence and clones - aligned in ClustalW, blunt ended).

My problem is that I got gaps in results (image is attached, it is from Cymate Result) on positions 27, 50, 58 etc. When I looked back at my original alignment FASTA file and compared those positions, I noticed that those gaps are from C to A, or C to G transitions (and obviously Cymate outputs only C to C, and C to T cases).

Does anyone know why do I get those C to A, or C to G, are those some mismatches, should I manually remove it from alignment, or did I made some other mistake?

Any advice from you would be very useful for me!

My FASTA file alignment, which I uploaded to Cymate, is here (if anyone wants to check). First sequence is master sequence (non-treated, orifinal sequence) and the rest are clones (bisulphite treated sequences).


I quickly aligned your sequences, and as far as I can tell, the answer is that your "consensus" sequence is a poor match for the 3 other sequences:

CLUSTAL format alignment by MAFFT L-INS-i (v7.310) Consensus_cepa_ gaaagccaaccaccacatccgccatccctcacagtatgccaacgagcagctgaatgactc MV_A_SP6_izreza aaaaatcaatctccacatccgccatcactcacagtatatttac-agtagataaataa-tt MV_C_SP6_izreza aaaaatcaatctccacatccgccatcactcacagtatatttac-agtaaataaataa-tt MV_B_SP6_izreza aaaaatcaatctccacatccgccatcactcacagtatatttac-agtagataaataa-tt .***… ***.* ************** **********… ** **.*. *.***.* *. Consensus_cepa_ cgaacaacgcaaagcatgacgcccaaaccgacgaacacccaaccgaaaggccacaagcgc MV_A_SP6_izreza aaaataacgcaaaacataacgcccaaacagacgtactctcaacctaataacctcgaacgc MV_C_SP6_izreza aaaataacgcaaaacatgacacccaaacagacatactctcaacctaataacctcgaacgc MV_B_SP6_izreza aaaatgacgcaaaacataacgcccaaacagacgtactctcaacctaatgaccttgaacgc .**… *******.***.**.******* ***. ** *.***** **… **… *.*** Consensus_cepa_ aacgtgcattccaaaaccccatgaaccaccgaatcctgcaatgcacaccacgctcgacgc MV_A_SP6_izreza aacttacattcaaaaactcgataattcacgaaattctgcaattcacaccaaatattgcat MV_C_SP6_izreza aacttacattcaaaaactcgataattcacggaattctgcaattcacaccaaatatcgcat MV_B_SP6_izreza aacttacattcaaaaactcgatgattcacggaattctgcaattcacaccaaatatcacat *** *.***** *****.* **.* .*** .***.******* *******… *… Consensus_cepa_ cattcgctacagaccacatcgacacgagcgccaagccatccatcacccgaagccatccac MV_A_SP6_izreza --ttcgctacgttcttcatcgacacgaaaaccaaaatatccattgccaaaaatcattcag MV_C_SP6_izreza --ttcgctacgttcttcatcgacacgagaaccaaaatatccattgccagaaatcattcag MV_B_SP6_izreza --ttcgctacgttcttcatcaacacgaaaaccaaaatatccattaccagaaatcattcag ********. *. ****.******… ****… ******… ** .**… ***.** Consensus_cepa_ ccactcacaaacataccacgaagcacagcaaatatgaagacaaaccctc MV_A_SP6_izreza acgctcactgaaataacacaaaacacatcaaa-ataataacaaactttc MV_C_SP6_izreza acgctcactgaaataacacaaaacacatcaaa-ataataacaaactttc MV_B_SP6_izreza acgctcactgaaataacacaaaacacatcaaa-ataataacaaactttc *.***** .* *** ***.**.**** **** **.* .******… **

I would suggest trying to come up with a different, more appropriate consensus/reference sequence. This one does not appear to be appropriate.

Granted, I have no idea how you came up with this consensus sequence (it certainly is not a consensus of the other sequences you show), and possibly there is something else going on that I am not understanding.

This does not seem to have anything to do with the bisulfite conversion, as the data also include changes that cannot be attributable to bisulfite, gaps, etc.

Knowing more about the whole workflow would help.


Also, while I'm not familiar with the workflow, could such gaps not be related to methylation on the opposite strand? E.g. CGTA-->TACG.

Is there a particular methylation context that you are looking at? CpGs only?

In some cases there is support for weird variants (e.g. consensus is not an outgroup).

If it's sanger data, it's always a good idea to directly inspect traces.

Chapter 18: DNA Mutation and Repair

- Silent mutation: changes a codon to a synonymous codon that specifies that same amino acid, altering the DNA sequence without changing the amino acid sequence:
* there are some phenotype effects including:
- tRNA are used for different synonyms codon (affects the rate of protein synthesis)
- exon-intron junctions that affect splicing
- binding miRNAs to complementary sequence in mRNA, which determine whether mRNA is translated
- Neutral mutation:a missence mutation that alter the amino acid sequence of protein but does not significant change its function

- Nitrous Acid: deaminates cytosine, creating uracil
- Transition mutation

3 nonsense codons: UGA UAA & UAG

GGA (for Gly) would be mutated to a nonsense codon substituting a U for a G

b) If a single transversion occurs in a codon that specifies Phe, what amino acids
can be specified by the mutated sequence?

c) If a single transition occurs in a codon that specifies Leu, what amino acids can
be specified by the mutated sequence?

UUU: CUU(Leu) UCU(Ser) UUC(Phe)
UUC: CUC(Ser) UCC(Ser) UUU(Phe)

b) UUU: AUU (Ile), UAU (Tyr), UUA (Leu), GUU (Val), UGU (Cys), UUG (Leu)

UUC: AUC (Ile), UAC (Tyr), UUA (Leu), GUC (Val), UGC (Cys),
UUG (Leu)

c)CUU: UUU (Phe), CCU (Pro), CUC (Leu)
CUC: UUC (Phe), CCC (Pro), CUG (Leu)
CUA: UUA (Leu), CCA (Pro), CUU (Leu)
CUG: UUG (Leu), CCG (Pro), CUA (Leu)
UUG: CUG (Leu), UCG (Ser), UUA (Ser)
UUA: CUA (Leu), UCA (Ser), UUG (Leu)

d)UUA: AUA (Met), UAA (Stop), UUU (Phe), GUA (Val), UGA
(Stop), UUC (Phe)

UUG: AUG (Met), UAG (Stop), UUU (Phe), GUG (Val), UGG (Trp), UUC (Phe)

CUU: GUU (Val), CGU (Arg), CUG (Leu), AUU (Ile)
CUC: AUC (Ile), CAC (His), CUA (Leu), GUC (Val), CGC (Arg),CUG (Leu)

CUA: AUA (Ile), CAA (Gln), CUC (Leu), GUA (Val), CGA (Arg), CUG (Leu)

Sequence of DNA template: 3'-TAC TGG CCG TTA GTT GAT ATA ACT-5'
Nucleotide number: 24 nucleotides (1 @ T 3' and 24 at T 5')

Original Sequence: 3'-TAC TGG CCG TTA GTT GAT ATA ACT-5'

Amino acid sequence: Amino-Met Thr Gly Asn Gln Leu Tyr Stop-Carboxyl

Original sequence: 3'-TAC TGG CCG TTA GTT GAT ATA ACT-5'
Mutated sequence: 3'-TAC TGG CCG TCA GTT GAT ATA ACT-5'

Amino acids: Amino-Met Thr Gly Ser Gln Leu Tyr-Carboxyl

b)The transition results in the formation of a UAA nonsense codon.

Original sequence: 3'-TAC TGG CCG TTA GTT GAT ATA ACT-5'
Mutated sequence: 3'-TAC TGG CCG TTA ATT GAT ATA ACT-5'

Amino acid sequence: Amino-Met Thr Gly Asn-Carboxyl

c)The one-nucleotide deletion results in a frameshift mutation.

Original sequence: 3'-TAC TGG CCG TTA GTT GAT ATA ACT-5'
Mutated sequence: 3'-TAC TGG CGT TAG TTG ATA TAA CT-5'

Amino acids: Amino-Met Thr Ala Ile Asn Tyr Ile -Carboxyl

d)The transversion results in the substitution of His for Gln in the protein.

Original sequence: 3'-TAC TGG CCG TTA GTT GAT ATA ACT-5'
Mutated sequence: 3'-TAC TGG CCG TTA GTA GAT ATA ACT-5'
Amino acids: Amino-Met Thr Gly Asn His Leu Tyr-Carboxyl

Mutated sequence: 3'-TAC TGG CCG TTA GTG GAT ATA ACT-5'
Amino acids: Amino-Met Thr Gly Asn His Leu Tyr-Carboxyl

e)The addition of the three nucleotides results in the addition of Thr to the amino
acid sequence of the protein, and is an in-frame insertion.

Original sequence: 3'-TAC TGG CCG TTA GTT GAT ATA ACT-5'
Mutated sequence: 3'-TAC TGG TGG CCG TTA GTT GAT ATA ACT-5'

Amino acids: Amino-Met Thr Thr Gly Asn Gln Leu Tyr-Carboxyl

f)The protein retains the original amino acid sequence.

Original sequence: 3'-TAC TGG CCG TTA GTT GAT ATA ACT-5'
Mutated sequence: 3'-TAC TGG CCA TTA GTT GAT ATA ACT-5'

Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.

BIOL 3500: DNA Mutations

A proper active site shape is made because of 2 amino acids that have opposite charges that attract (Glu which has a - charge and Arg that has a + charge) a mutation of the Arg to an uncharged amino acid results in improper active site shape another mutation of a nearby amino acid from uncharged to + charge (Lys) results in a nearly proper active site shape so it can function even with 2 mutations
(Two wrongs can sometimes make a right?)

Ames test: uses bacterial tester strains to screen for chemical mutagens

Ex. Salmonella cells that are histidine auxotrophs
(are sensitive to mutation by chemicals)
- Add suspected mutagen to cells
- If mutagen: revertant mutation will produce His prototrophic colonies
- If not mutagen: no growth (cannot revert back to WT)
*Tests to see if a chemical is a mutagen by testing to see if, when added to histidine auxotrophs, if it will mutate them to be prototrophs (before couldn't survive without being provided histidine, then after mutation can)

Auxotrophs: require chemical unable to synthesize in growth medium

Many chemicals not mutagenic themselves but converted to mutagens by enzymatic detoxification pathways in liver

1. Add liver enzyme (S9 extract)
2. Add test chemical to middle of plate - Medium without histidine
3. Negative control:
Plate with nonmutagenic solvent - few colonies have spontaneous mutations
4. Number of revertant colonies measure of mutagen potency (the test chemical is only applied to the middle of the plate lacking histidine - a positive result is when a large amount of revertant cells aka those that were able to survive because of a mutation surround the middle where the test chemical was added if a small amount and not surrounding middle, its negative and the ones that survived were from spontaneous mutations, not the chemical acting as a mutagen)

Normal individuals:
- Certain genes & chromosome locations contain regions with trinucleotide repeats
- Sequences are transmitted from parent to offspring without mutation
*Normal individuals have these, but its how many that matters

When the number of trinucleotide repeat sequences increases, this may cause mutations

TNRE disorders result when repeat copy number increases above critical size (repeat location frequently expands)
*There is a normal range and disease (critical) range for each trinucleotide sequence

2. Severity depends on inheritance from mother or father (how likely it is to expand to more repeats in future generations)
- HD: TNRE likely if inheritance is from father
- Myotonic muscular dystrophy more likely to worsen if from mother

3. Cause not well understood

4. Increased repeats alters DNA structure (stem loop formation)

Common causes / Description

1. Abberant recombination -Abnormal crossing over may cause deletions, duplications, translocations, inversions

2. Abberant segregation - Abnormal chromosome segregation may cause aneuploidy or polyploidy

3. DNA replication errors -Mistake by DNA polymerase may cause point mutation

4. Toxic metabolic products -Products of normal metabolic processes may be chemically reactive agents that
can alter DNA structure


The past few years have seen an explosion of interest in the epigenetics of cancer. This has been a consequence of both the exciting coalescence of the chromatin and DNA methylation fields, and the realization that DNA methylation changes are involved in human malignancies. The ubiquity of DNA methylation changes has opened the way to a host of innovative diagnostic and therapeutic strategies. Recent advances attest to the great promise of DNA methylation markers as powerful future tools in the clinic.

Proteins that bind to a specific DNA sequence to control the transcription of the genetic information from DNA to RNA.

DNA methylation that protects bacteria from restriction endonuclease enzymes, providing a defence mechanism against invasion by bacteriophages and viruses.

Regions of DNA that are located near to transcription start sites and control transcription initiation.

Genetic elements that are transcribed into RNA, then reverse-transcribed back into DNA and inserted into the genome.

An epigenetic marking of one copy of the gene (from the mother or father) that ensures gene expression in a parent-of-origin-specific manner.

Gene silencing of transposons by epigenetic mechanisms, including DNA methylation and the effect of small non-coding RNAs, which prevents transcription and ensures genome stability.

Post-translational chemical modifications of amino acid residues on a histone.

Proteins that control access to the genetic information by either inducing histone modifications or using energy to alter histone–DNA interactions.

(CGIs). Regions with high cytosine-phosphate-guanine (CpG) dinucleotide density.

Cells that can differentiate into any other tissue of the body.

Regulatory regions of the genome that are marked by histone modifications and enhance the transcription of their associated genes when bound to transcription factors.

(BER). A cellular mechanism that removes small base lesions, caused by mismatched or modified DNA bases, from the DNA.

An eight-protein complex of two histone H2A–H2B dimers and two histone H3–H4 dimers that together form the core of the nucleosome.

A type of white blood cell that is fundamental to the adaptive immune system.

A process whereby B cells rearrange parts of the immunoglobulin heavy chain locus to generate antibodies with different properties.

Repetitive nucleotide sequences that protect the ends of chromosomes.

Sequences that occur multiple times throughout the genome.

An enzyme that catalyses the transcription of DNA to RNA.

An adaptor RNA and amino acid carrier that helps to decode mRNA for translation into the synthesis of proteins.

Enzymes that cut DNA at endogenous phosphodiester bonds.

DNA cleavage at the phosphodiester bond resulting in the elimination of the 3′-phosphate residue.

DNA cleavage at the phosphodiester bond resulting in the elimination of the 5′-phosphate residue.

Clustered regularly interspaced short palindromic repeats (CRISPR) systems

Genome-editing systems, such as CRISPR–Cas9, that rely on a bacterial virus-defence mechanism involving repetitive DNA sequences that contain snippets of viral DNA. By manipulating CRISPR systems, DNA can be cut at a desired location, allowing genes to be removed or added.

Transcription activator-like effectors

(TALEs). Proteins that can be programmed to target specific DNA sequences in the genome.

Additional Information

Accession codes: The following coordinates have been deposited in the RCSB Protein Data Bank. The GRE-GRDBD is the GRDBD and GRE complex (PDB ID: 5EMQ), smGRE-GRDBD is the GRDBD and smGRE complex (PDB ID: 5EMC), mmGRE-GRDBD is the GRDBD and mmGRE complex (PDB ID: 5EMP). BisChIP-seq data can be downloaded from the National Center for Biotechnology Information GEO (GSE64171).

How to cite this article: Jin, J. et al. The effects of cytosine methylation on general transcription factors. Sci. Rep. 6, 29119 doi: 10.1038/srep29119 (2016).

Supporting information

S1 Table. Concentration of the DNA samples before bisulfite treatment.

The concentration of the untreated DNA samples obtained from PBMCs of five donors before bisulfite treatment measured by Qubit dsDNA BR (broad range) Assay Kit.

S2 Table. Amount of input DNA for bisulfite treatment and concentration of the DNA samples after bisulfite treatment.

Two aliquots of all five DNA samples were treated two independent times, yielding in ten bisulfite treated samples, and subsequently, the duplicate samples were pooled. The quantification measurements are done in duplicate with the Qubit ssDNA Assay kit, and the data shown are averages ± SD of these ten concentrations.

S3 Table. Cq values and ranking of the qPCR experiments from the six used primer pairs.

First the average of the three technical replicates was calculated for all five samples. The data given is the geometric mean of the average values from the five donor samples. Genomic DNA is the measurement of untreated DNA, which is only conducted for the cytosine free primers since they have the same efficiency before and after treatment. The different kits are ranked by the Cq values for every primer pair (Rank in the table). Subsequently, the median of these rankings is calculated to assess a final ranking. This is the ranking given in Table 1.

S4 Table. Results and ranking of the dPCR experiments from the three primer pairs that were used.

First, the average of the two technical replicates was calculated and normalized to the DNA input for all five samples. The data given is the geometric mean of the average values from all five donor samples. The different kits are ranked by the amount of copies per ng bisulfite treated DNA for every primer pair (Rank in the table). Subsequently, the median of these rankings is calculated to assess a final ranking. This is the final ranking given in Table 1.

S5 Table. The percentages of overall DNA loss in the samples.

The DNA loss is assessed by dPCR before and after bisulfite treatment. This dPCR measures the intact copies of specific lengths present in the samples. To obtain the averages given in the table, the average of the two technical replicates was calculated both before and after bisulfite treatment. These averages were used to calculate the average loss per kit of all the five donor samples. Subsequently, the geometric means and standard deviations from these averages were calculated.

S6 Table. Conversion efficiencies and overall methylation percentage for the different kits.

The data is obtained by sequencing of the 8 kits that performed best in the previous fragmentation assessments. One out of five donor samples was used, and two separate PCR products were sequenced: amplicons CFP2 and CCP3. CFP2 (414 bp) counts 62 Cs, of which 2 CpGs CCP3 (476 bp) counts 159 Cs, of which 32 CpGs.

S7 Table. Conversion protocol for different temperatures for Epitect (kit 10).

The different protocols followed a fixed time schedule as provided by the manual from the manufacturer. Temperature protocol 3 is the same as the protocol provided by the manufacturer.

S8 Table. Conversion protocol for different time schedules for Epitect (kit 10).

The different protocols always followed the temperature scheme as provided by the manual from the manufacturer. Time protocol 3 is the same as the protocol provided by the manufacturer.

S9 Table. Concentration after elution with Epitect (kit 10) with the different time and temperature protocols as depicted in S7 and S8 Tables.

In the end of the protocol, 2 elutions of the same sample were performed (as recommended by the manufacturer to maximize DNA yield). The data given is a single measurement of the donor used in these time and temperature experiments.

S10 Table. Primer sequences and annealing temperatures for qPCR and dPCR.

S1 Fig.

Bioanalyzer (upper plots) and gel electrophoresis (lower plots) analysis of the different kits. The plots shown in this supplement show the electrophoresis data of the five donor samples.

S2 Fig. Comparison of different recovery analyzes (red: Qubit vs black: dPCR).

This recovery is based on the amount of overall DNA loss in the samples as shown in S5 Table.

S3 Fig. qPCR analysis of the alternative protocols from the Epitect Bisulfite kit as shown in S7 and S8 Tables.

Upper panel: protocol changed in conversion temperature (S7 Table). Lower panel: protocol changed in conversion time (S8 Table). Results are given as Cq values ± SD.

S4 Fig. dPCR analysis of the alternative protocols from the Epitect Bisulfite kit as shown in S7 and S8 Tables.

Upper panel: protocol changed in conversion temperature (S7 Table). Lower panel: protocol changed in conversion time (S8 Table). Results are given as number of intact copies per ng bisulfite treated DNA measured by dPCR ± SD.

S5 Fig.

Example of the melting curves (upper panel) and the Caliper LabChip GX results (lower panel) after qPCR showing aspecific melting curves, but specific bands for the same reaction: CCP1_after.

S6 Fig.

Example of the melting curves (upper panel) and the Caliper LabChip GX results (lower panel) after qPCR showing aspecific melting curves, but specific bands for the same reaction: CCP2_after.

S1 Dataset. Sequencing data used for the conversion efficiency.

Fastq files of the sequencing data that were used for the conversion efficiency calculation.


DNA methylation plays pivotal roles in development and cell lineage differentiation in plants and animals [1, 3–6]. While our knowledge of DNA methylation pathways in animals, plants and fungi is relatively advanced, very little is known about DNA methylation in microbial eukar-yotes, such as ciliates. Though early work uniformly failed to identify cytosine methylation in Paramecium aurelia, T. thermophila, or O. trifallax [73–75], we have here identified both methylcytosine and hydroxymethyl-cytosine as vital players in the genome rearrangement process of O. trifallax. We have unambiguously identified these modifications using high-sensitivity nano-flow UPLC-MS, and have tested their functionality by preventing their formation using methyltransferase inhibitors. Because earlier work examined vegetative samples of O. trifallax, which we confirm are lacking in both methylcytosine and hydroxymethylcytosine, it did not detect the de novo methylation and hydroxymethylation that we show occurs only transiently during genome rearrangements. Supporting these observations, a report in 2003 described de novo methylation in the stichotrichous ciliate (and close O. trifallax relative) Stylonychia lemnae [76]. In that work, though detected at low levels in vegetative MIC, cytosine methylation was detectable primarily during the genome rearrangement processes, where it was introduced de novo within eliminated transposon-like sequences [76]. As in the O. trifallax system, methylation was observed in all sequence contexts within the transposable element, and was clustered in a region spanning approximately 500 bp [76]. While our results generally support the conclusions of the S. lemnae study, our work differs in some important ways: firstly, since hydroxymethylation had not yet been identified as an important epigenetic mark in DNA, it was not analyzed in S. lemnae secondly, O. trifallax DNA methylation/hydroxymethylation occurs at a much higher level (70%-90%) than reported in S. lemnae (25%) thirdly, O. trifallax has significant modification of at least a few macronuclear chromosomes and aberrant splicing products, neither of which was reported for S. lemnae fourthly, the data presented here directly implicate methylation/hydroxymethylation in O. trifallax's DNA elimination pathway and, fifthly, we report a 20 bp motif that appears to play a role in directing methylation/hydroxymethylation to particular regions of specific chromosomes. We demonstrate that the DNA methylation process plays a significant functional role in the elimination of repetitive sequences in the MIC, including a highly abundant transposon family and an abundant satellite repeat family. We also report the specific methylation/hydroxymethylation of a small number of aberrantly rearranged molecules but not their correctly rearranged counterparts, suggesting a role for DNA modification in either error recognition during chromosome rearrangement and/or the degradation of such incorrectly rearranged molecules.

Functional data presented here support a role for DNA methylation in degradation pathways, because methylation appears enriched in DNA from the parental MAC, which is targeted for elimination, as well as repetitive MIC eliminated sequences. We found that inhibition of DNA methyltransferases by decitabine led to significant demethylation of 6 out of 9 MAC chromosomes and one MIC locus (the 170 bp satellite repeat Figure 7c). Coincident with the decitabine-induced loss of methylation from these chromosomes, we observed a mild but often statistically significant accumulation of the chromosomes themselves (the native DNA signal in Figure 7c). While this accumulation is modest, with a maximum 1.5- to 2-fold increase, these data provide compelling support across multiple chromosomes for an intimate link between DNA methylation/hydroxymethylation and degradation during genome rearrangement.

Further support for the model comes from the examination of cells that have completed genome rearrangements after azacitidine and decitabine treatment: 170 bp satellite repeats and TBE1 transposons display a statistically significant accumulation relative to untreated controls (Figure 8b,c,d). In addition, azacitidine treatment induces an accumulation of germline TEBPα and a decrease in MAC versions of the same gene (Figure 8b, c). We observe other genome rearrangement defects upon azacididine or decitabine treatment: along with TEBPα, Contig4414 also shows lower levels, while two other chromosomes showed elevated levels (Contig18539 and Contig15988), consistent with retention of parental MAC chromosomes that were not degraded properly. These data demonstrate the complexity of the functional consequences of inhibiting DNA methylation: effects may be direct (such as a failure to degrade a given molecule of DNA from the parental MAC) or indirect (for example, if the cell cannot properly eliminate an IES from the MIC version of a gene and therefore does not produce enough MAC product). Further work is needed to disentangle these effects but, taken together, the data implicate a DNA methylation/hydroxymethylation pathway in the elimination of repetitive and single-copy elements from the MIC genome and in the production of a functional macronuclear genome.

The relationship between cytosine methylation and hydroxymethylation in O. trifallax offers new challenges. In mouse, for example, sperm DNA is methylated but paternal genome methylation is rapidly lost upon fertilization [77], as the embryo undergoes epigenetic repro-gramming and establishment of new methylation patterns [78, 79]. Hydroxymethylcytosine appears in the paternal, but not maternal, pronucleus during this dramatic re-writing of the epigenetic code [80, 81], coincident with the loss of paternal methylation. Other work has linked hydroxymethylation with tissue-specific promoter activation and, presumably, demethylation during development [82]. Hydroxymethylation is dependent upon pre-existing methylation and so exists in a dynamic tension with it: both modifications can mark the same genomic regions [83], as we see in O. trifallax, and this phenomenon is particularly prevalent in embryonic stem cells [84, 85]. Yet hydroxymethylation also antagonizes methylation by directing its removal and/or blocking methylcytosine-binding heterochromatin proteins [86, 87]. The link between methylation and degradation in O. trifallax suggests that the organism might use hydroxymethylation as a countervailing, stabilizing force, perhaps to target genes that are important for conjugation. Other mechanisms may also be involved in this association: O. trifallax's most hydroxymethylated ribosomal protein gene is a homolog of L12, which in bacteria and yeast can regulate ribosome initiation and elongation [88, 89]. Therefore, changes in expression of the L12-encoding chromosome may have ramifications across the proteome, possibly even shutting down translation while the organism undergoes the elaborate steps of genome remodeling.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Figure 1 Borane-containing compounds screening and proposed mechanism for the borane reaction of 5caC.

(a) Borane-containing compounds screened for conversion of 5caC to DHU in an 11mer oligo, with conversion rate estimated by MALDI. 2-picoline borane (pic-borane), borane pyridine, tert-butylamine borane, and ammonia borane could completely convert 5caC to DHU while ethylenediamine borane and dimethylamine borane only gave around 30% conversion rate. No detectable products were measured (n.d.) with morpholine borane, 4-methylmorpholine borane, trimethylamine borane, and cyclohexylamine borane. Other reducing agents such as sodium borohydride and sodium tri(acetoxy)borohydride decomposed rapidly in acidic media and led to incomplete conversion. Sodium cyanoborohydride was not used due to potential for hydrogen cyanide formation under acidic conditions. Pic-borane and pyridine borane were chosen because of the complete conversion, low toxicity and high stability. (b) Proposed mechanism for the borane reaction of 5caC to DHU.

Supplementary Figure 2

Proposed mechanism for the borane reaction of 5fC to DHU.

Supplementary Figure 3 MALDI characterization of 5fC and 5caC containing model DNA oligos treated by pic-borane with or without the blocking of 5fC and 5caC.

5fC was blocked by O-ethylhydroxylamine which becomes oxime and resists pic-borane conversion while 5caC was blocked by ethylamine via EDC conjugation and converted to amide which blocks conversion by pic-borane. All experiments were performed once. Calculated MS was shown in black, and observed MS was shown in red.

Supplementary Figure 4 MALDI characterization of 5mC and 5hmC containing model DNA oligos treated by KRuO4 and pic-borane with or without blocking of 5hmC.

5hmC could be blocked by βGT with glucose and converted to 5gmC. 5mC, 5hmC and 5gmC could not be converted by pic-borane. 5hmC could be oxidized by KRuO4 to 5fC, and then converted to DHU by pic-borane. All experiments were performed once. Calculated MS was shown in black, observed MS was shown in Red.

Supplementary Figure 5 Restriction enzyme digestion showed TAPS effectively converts 5mC to T.

(a) Illustration of restriction enzyme digestion assay to confirm the sequence change caused by TAPS. (b) Taq α tests to confirm the C-to-T transition caused by TAPS. A PCR-amplified 222 bp model DNA with Taq α I restriction site in the middle can be cleaved, whereas the amplified product of 5mC-TAPS stayed intact, suggesting loss of the restriction site and hence C-to-T transition. TAPS did not result in C-to-T transition on the unmethylated cytosine since C-TAPS was cleaved in the same way as the original untreated C. Experiment was performed once.

Supplementary Figure 6 Complete C-to-T transition induced after TAPS, TAPSβ and CAPS as indicated by Sanger sequencing.

Model DNA containing single methylated and single hydroxymethylated CpG sites was prepared as described in Supplementary Note 3. TAPS conversion was done following the NgTET1 Oxidation and Pyridine borane reduction protocols described in the Methods. TAPSβ conversion was done following the 5hmC blocking, NgTET1 Oxidation and Pyridine borane reduction protocols. CAPS conversion was done following the 5hmC oxidation and Pyridine borane reduction protocols. After conversion, 1 ng of converted DNA sample was PCR amplified by Taq DNA Polymerase and processed for Sanger sequencing. TAPS converted both 5mC and 5hmC to T. TAPSβ selectively converted 5mC whereas CAPS selectively converted 5hmC. None of the three methods caused conversion on unmodified cytosine and other bases.

Supplementary Figure 7 TAPS is compatible with various DNA and RNA polymerases and induces complete C-to-T transition shown by Sanger sequencing.

The model DNA containing methylated CpG sites for the polymerase test and primer sequences is described in Supplementary Note 3. After TAPS treatment, 5mC was converted to DHU. KAPA HiFi Uracil plus polymerase, Taq polymerase, and Vent exo- polymerase read DHU as T and therefore induce complete C-to-T transition after PCR. Alternatively, primer extension was done with a biotin-labelled primer and isothermal polymerases including Klenow fragment, Bst DNA polymerase, and phi29 DNA polymerase. The newly synthesized DNA strand was separated by Dynabeads® MyOne Streptavidin C1 and then amplified by PCR with Taq polymerase and processed for Sanger sequencing. T7 RNA polymerase could efficiently bypass DHU and insert adenine opposite the DHU site, which is shown by RT-PCR and Sanger sequencing. Other commercial polymerases including KAPA HiFi polymerase, NEB Q5 polymerase, and Phusion polymerase were also tested but failed to amplify DHU containing DNA efficiently.

Supplementary Figure 8 DHU does not show PCR bias compared to T and C.

A model DNA containing one DHU/U/T/C modification was synthesized with the corresponding DNA oligos as described in Supplementary Note 3. Standard curves for each model DNA with DHU/U/T/C modification were plotted based on qPCR reactions with 1:10 serial dilutions of the model DNA input (from 0.1 pg to 1 ng). Every qPCR experiment was run in triplicates (n=3 technical replicates). The slope of the regression between the log concentration (ng) values and the average Ct values was calculated by SLOPE function in Excel. PCR efficiency was calculated using the following equation: Efficiency % = (10^(-1/ Slope)-1)*100% Amplification factor was calculated using the following equation: Amplification factor=10^(-1/Slope). The PCR efficiency for the model DNAs with DHU or T or C modification were almost the same, which demonstrated that DHU could be read through as a regular base and would not cause PCR bias.

Supplementary Figure 9 TAPS completely converted 5mC to T regardless of DNA fragment length.

(a) Agarose gel images of the TaqαI-digestion assay confirming complete 5mC to T conversion in all samples regardless of DNA fragment length. 194 bp model sequence from the lambda genome was PCR amplified after TAPS and digested with TaqαI enzyme. The PCR product amplified from unconverted sample could be cleaved, whereas products amplified on TAPS treated samples stayed intact, suggesting loss of restriction site and hence complete C-to-T transition. Experiment was performed once. (b) The C-to-T conversion percentage was estimated by gel band quantification as 100% for all DNA fragment lengths tested.

Supplementary Figure 10 The conversion and false positive rate for different TAPS conditions.

The combination of mTet1 and pyridine borane achieved the highest conversion rate of methylated C (96.5%, calculated with fully CpG methylated Lambda DNA) and the lowest conversion rate of unmodified C (0.23%, calculated with 2kb unmodified spike-in), compared to other conditions with NgTET1 or pic-borane. Shown above are the conversion rates +/- SE of all tested cytosine sites (N of 2kb unmodified positions = 1041, N of covered bacteriophage lambda CpG positions used: mTet1 pyridine borane 6226, mTet1 pic-borane 5871, NgTET1 pyridine borane 5768, NgTET1 pic-borane 6226).

Supplementary Figure 11 TAPS resulted in more even coverage and fewer uncovered positions than WGBS.

Comparison of coverage depth across (a) all bases (N = 2 725 765 481) and (b) CpG sites (N = 43 445 914, based on mm9 genome, which includes potential genetic variants in E14 genome) between WGBS and TAPS, computed on both strands. For ‘TAPS (down-sampled)’, random reads out of all mapped TAPS reads were selected so that the median coverage matched the median coverage of WGBS. Positions with coverage above 50× are shown in the last bin.

Supplementary Figure 12 Modification levels around CpG Islands.

Average modification levels in CpG islands (binned into 20 windows) and 4 kb flanking regions (binned into 50 equally sized windows). Bins with coverage below 3 reads were ignored.

Supplementary Figure 13 TAPS exhibits smaller coverage-modification bias than WGBS.

All CpG sites were binned according to their coverage, and the mean (blue) and the median (orange) modification values are shown in each bin for WGBS (a) and TAPS (b). The CpG sites covered by more than 100 reads are shown in the last bin. The lines represent a linear fit through the data points.

Supplementary Figure 14 Distribution of modification levels across all chromosomes.

Average modification levels in 100 kb windows along mouse chromosomes, weighted by the coverage of CpG, and smoothed using a Gaussian weighted moving average filter with window size 10.

Supplementary Figure 15 Low-input gDNA and cell-free DNA TAPS libraries prepared with dsDNA KAPA HyperPrep library preparation kit.

Sequencing libraries were successfully constructed with as little as 1 ng of (a) mESC gDNA and (b) cell-free DNA with KAPA HyperPrep kit. Experiment was performed once. Note that cell-free DNA has a sharp length distribution around 160 bp (nucleosome size) due to plasma nuclease digestion. After library construction, it becomes

300bp, which is the sharp band in (b).

Watch the video: The 4 Nucleotide Bases: Guanine, Cytosine, Adenine, and Thymine. What Are Purines and Pyrimidines (September 2022).


  1. Dikora

    just what do you have to do in this case?

Write a message