How does formaldehyde cause protein-DNA crosslinking?

How does formaldehyde cause protein-DNA crosslinking?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

How does formaldehyde cause protein-DNA crosslinking? I would guess it's because the strongly polar water molecule interacts strongly with polar residues on a protein-DNA complex, and adding a less polar solvent causes the DNA and protein to pull more tightly on each other than their pull on the solvent, but I haven't been able to find an answer online.

In this work they find that formaldehyde crosslinking happens by formation of a methylol adduct (due to nucleophilic attack by N or S in case of proteins) in protein which then attacks the DNA or vice-versa. The final crosslink is by a methylene bridge

Formaldehyde can react to amino groups in nucleotides and proteins and form a schiff's base, but i dont have a clue how this is involved in crosslinking

How does formaldehyde cause protein-DNA crosslinking? - Biology

Kendric C. Smith 1 and Martin D. Shetlar 2

1 Emeritus Professor, Radiation Oncology (Radiation Biology)
Stanford University School of Medicine
800 Blossom Hill Road, Unit R169, Los Gatos, CA 95032
[email protected]

2 Professor Emeritus of Chemistry and Pharmaceutical Chemistry
School of Pharmacy, Room S-822
University of California, Box 0912
513 Parnassus Avenue
San Francisco, CA 94143-0912
[email protected]

Deoxyribonucleic Acid (DNA) in living cells is associated with a large variety of proteins. Therefore, it is logical to assume that the ultraviolet (UV) irradiation of cells could lead to reactive interactions between DNA and the proteins that are in contact with it. One such reaction that can be envisioned is that the amino acids in these associated proteins may become crosslinked to the bases in DNA. Indeed such reactions do occur, and appear to be important processes that photoexcited DNA undergoes in vivo, as well as in DNA-protein complexes in vitro. [see below]

The first example of UV-induced crosslinking of DNA and proteins in a living system (Escherichia coli) was reported in 1962 (Smith, 1962). In the same study, it was noted that the treatment of E. coli with acridine orange and visible light also resulted in DNA-protein crosslinking. In fact, on the basis of the dose of radiation needed to produce the same reduction in colony formation, it was found that visible light plus dye crosslinked a larger percentage of the DNA than did UV radiation. On the other hand, X-irradiation produced little if any crosslinking (see below) (Smith, 1962). Since these early studies, photoinduced DNA-protein crosslinking has been observed in other cellular systems, as well as in isolated DNA-protein complexes. Among the latter are the crosslinking of histones to DNA in eukaryotic nucleosomes, the crosslinking of RNA polymerase and DNA polymerases to DNA, and the cross-linking of the gene 5 "melting" protein from fd phage to single-stranded DNA. (For reviews and references, see Smith, 1976 Shetlar, 1980 Saito and Sugiyama, 1990).

Detecting DNA-Protein Crosslinks

A number of methods have been developed for detecting DNA-protein cross-links (reviewed in Shetlar, 1980). The first method used was based upon the extraction of DNA free of protein from cells following UV irradiation. DNA can be isolated from E. coli free of protein using a detergent (sodium dodecyl sulfate) extraction procedure (Smith and Kaplan, 1961 Smith, 1962).

UV-Induced Crosslinks. With increasing doses of UV radiation there is a linear decrease in the amount of free DNA that can be extracted, and an increasing amount of DNA that remains associated with the denatured proteins (Figure 1).

Note that 30% of the DNA is seven times more sensitive to crosslinking with protein than is the remainder (see below). At a UV dose that kills 99% of the cells, about 10% of the DNA was crosslinked with protein (Smith, 1962).

As one might predict, it is the replicating portion of the DNA chromosome that is more sensitive to UV-induced crosslinking with protein (Smith, 1964a). This was tested by pulse labeling a log-phase culture of E. coli with tritiated thymine, and then replacing the radioactive thymine with non-radioactive thymine, and allowing the culture to continue growing exponentially. Samples were taken at various times over 2 generation times, irradiated and analyzed (Figure 2).

This experiment suggests that the replication proteins are the ones most sensitive to DNA-protein crosslinking, i.e., the DNA-protein crosslinking was greatest immediately after the pulse labeling, and again one and two generations after the pulse labeling.

Photosensitized Crosslinks. When bacterial cells were pretreated with acridine orange or with methylene blue, and then exposed to intense visible light, survival deceased markedly, and a large amount of their DNA was crosslinked to protein. There is a correlation between the relative killing efficiency of the two dyes, and the relative production of DNA protein crosslinks (Figure 3).

X-ray-Induced Crosslinks. After an X-ray does of 1 krad there was about a 5% loss in the extractability of DNA from E. coli B, but doses up to 40 krads did not alter this value (Smith, 1962). These results may be explained by the fact that X-rays produce a significant amount of single- and double-strand breaks in DNA, and the fact that the assay depends upon the selective precipitation of large molecular weight DNA crosslinked with protein.

X-ray-induced crosslinks are observed, however, if cells are irradiated under nitrogen (Barker et al., 2005). Under nitrogen, three times less DNA double-strand breaks are formed than when cells are X-irradiated under oxygen (Bonura et al., 1975). X-ray-induced DNA addition reactions are common, however, including DNA-DNA crosslinks (Myers, 1976).

Photoreactions of Nucleobases and Nucleosides with Amino Acids and Related Compounds

The first amino acid shown to photochemically add to uracil was cysteine, to form 5-S-cysteinyl-6-hydrouracil (Smith and Aplin, 1966). The chemical structure of the mixed photoproduct of thymine and cysteine was also determined (Smith, 1970) (Figure 4). Later work showed that this compound is photochemically-produced in two diastereomeric forms, and that two other compounds, namely 5-S-cysteinylmethyluracil (Varghese, 1973) and 5-S-cysteinyl-5,6-dihydrothymine (Shetlar and Hom, 1987) are also produced when thymine is irradiated in the presence of cysteine.

The photoreactivity of various polynucleotides for the addition of [ 35 S]cysteine was also studied. Rate constants for this reaction were measured poly rU was found to be the most reactive (k=21.8), followed by poly rC (k=8.1), poly dT (k=5.4), and poly rA (k=0.6). Ribonucleic acid showed a biphasic response with k=21.8 and k=4.8 (Smith and Meun, 1968).

In addition to the products of the photoreactions of thymine and uracil with cysteine, photoproducts have been characterized in a number of other systems containing nucleobases (or nucleosides), and amino acids (or amino acid analogs) (reviewed in Saito and Sugiyama, 1990). For example, the reaction of thymine with the phenolic ring system contained in N-acetyltyrosine results in a compound with the structure shown in Figure 4 (Shaw et al., 1998), while the reaction of thymidine with lysine yields a photoproduct of a very different nature (Figure 5) (Saito et al., 1981, 1983a). The production of this compound involves the attack of the ε-amino moiety of the lysine on the carbonyl group in the 2-position of thymidine. There is evidence that cytosine and 5-methylcytosine (a minor nucleobase found in eukaryotic DNA) photoreact with lysine to form adducts of a similar nature (Dorwin et al., 1988).

Interestingly, 5-methylcytosine also reacts with cysteine analogs (e.g., 3-mercaptopropionic acid) to form a 5-methylcytosine product with a structure analogous to that of the lysine adduct shown in Figure 5 (i.e., with NHR2 being replaced with a SR2 moiety and R1 = H) (Shetlar and Chung, 2013).

Other recent work (Shetlar et al., 2013) indicates that 2’-deoxycytidine photohydrates (as well as 2’-deoxyuridine and uridine hydrates) react with alkylamines (and polyamines) in a secondary dark reaction at near neutral pH to form a deoxycytidine adduct (deoxyuridine adduct, uridine adduct), analogous to that shown in Figure 5, in which the adducted amine moiety becomes attached as NR1 the sugar moiety is attached as NHR2. Early work (Janion and Shugar, 1967) showed that dihydrocytosines can undergo transamination reactions in the dark, in which aliphatic amines displace the 4-amino group of the dihydro compound to form "transaminated" dihydrocytosines. This same type of reaction also occurs with cytosine nucleoside photohydrates, in which the 5,6 bond is saturated. In particular, it has been shown that this reaction occurs when the RNA bacteriophage MS2 is irradiated, leading to crosslinking of lysine residues in coat protein to the genomic nucleic acid (Budowsky et al., 1976). These results suggest that secondary dark reactions, subsequent to the formation of primary photoproducts in DNA, may play a role in DNA-protein crosslinking. Determination of the contributions of various secondary crosslinking reactions to the total photochemistry of DNA in its cellular environment, may be a fertile field for further study.

Other amino acids are also reactive with nucleobases and polynucleotides. The first survey performed determined the ability of the 22 common amino acids to add photochemically (254 nm) to uracil. The 11 reactive amino acids were glycine, serine, phenylalanine, tyrosine, tryptophan, cystine, cysteine, methionine, histidine, arginine and lysine. The most reactive amino acids were phenylalanine, tyrosine and cysteine. Therefore, the photochemical addition of amino acids to uracil appears to be a fairly common phenomenon (Smith, 1969). A later study of the photoreactivity of polyuridylic acid indicated that all 20 amino acids were reactive, as well as a variety of glycylpeptides and other peptides (Shetlar et al., 1984c).

When thymine was similarly screened for photochemical reactivity with 22 common amino acids, only lysine, arginine, cysteine and cystine formed heteroadducts after exposure times similar to those used for uracil. Thymine was generally less reactive than uracil with amino acids when exposed to UV radiation (Schott and Shetlar, 1974). This may be due to the by shielding of carbon-5 in thymine by the methyl group at that position.

Another study used a fluorescence assay method to assess the photoreactivity of DNA and polynucleotides for the addition of various amino acids and peptides. The reactivities of the 20 amino acids commonly occurring in proteins were determined for their photochemical addition to denatured calf thymus DNA at pH 7. Fifteen amino acids were reactive, with cysteine, lysine, phenylalanine, tryptophan, and tyrosine being the most reactive. Alanine, aspartic acid, glutamic acid, serine, and threonine were unreactive (Shetlar et al., 1984a). Corresponding quantum yields were also determined for many of the glycyldipeptides (e.g., glycylserine) of the same amino acids. It was found that of the peptides studied, those containing lysine, cystine, proline, histidine and the various aromatic acids (phenylalanine, tyrosine, tryptophan) were the most reactive (glycylcysteine was not studied). Interestingly, in peptide form, all of the amino acids studied displayed some degree of reactivity. In almost all cases, amino acids incorporated into peptides had higher reactivities towards photoaddition than the corresponding carboxyl terminal amino acids at the same concentration.

Measurements similar to those done on DNA photoreactivity were made on the photochemical reactivity of four polyribonucleotides, namely poly rA, poly rC, poly rG, and poly rT, towards the addition of glycine and the L-amino acids commonly occurring in proteins, excluding proline (Shetlar et al., 1984b). Poly rA was reactive with eleven of the twenty amino acids tested, with phenylalanine, tyrosine, glutamine, lysine and asparagine being the most reactive. Poly rG reacted with sixteen amino acids phenylalanine, arginine, cysteine, tyrosine, and lysine displayed the largest quantum yields. Poly rC showed photoreactivity with fifteen amino acids, with phenylalanine, lysine, cysteine, tyrosine and arginine having the highest reactivities. Poly rT was reactive with fifteen of nineteen amino acids surveyed, and showed the highest quantum yields for cysteine, phenylalanine, tyrosine, lysine and asparagine. None of the polynucleotides were reactive with aspartic acid or glutamic acid. Studies on the photoaddition of various glycyldipeptides with each of the polynucleotides indicated that they were often more reactive than the amino acids themselves. In general, poly rT was the most reactive polynucleotide towards photoaddition for most of the amino acids and peptides studied. For example, the quantum yields for the photoaddition of phenylalanine to poly rT, poly rC, poly rA, and poly rG were in the ratio of 80:4:5:3.

Bromo- and iodo-substituted uracils and cytosines are also capable of photoreaction with amino acids. For example, 5-bromouracil photocouples with tyrosine, tryptophan and histidine (Dietz and Koch, 1987), as well as peptide linkages (Dietz et al., 1987). 5-Bromouracil also reacts with ethylamine (Shetlar et al., 1991), a lysine analog, to form a compound analogous to that formed in the reaction of thymine with lysine (Figure 5).

Cells of E. coli whose thymine has been replaced by 5-bromouracil are more sensitive to killing by UV irradiation than are unsubsititued cells (Greer, 1960 Kaplan et al., 1962), and they show a 5-fold greater sensitivity of UV-induced DNA-protein crosslinking than do unsubstituited cells (Smith, 1964b).

Photoinduced Crosslinks in DNA-Protein and Related Systems

A number of nucleobase-amino acid crosslinks have been identified in various UV-irradiated DNA-protein and related systems. For example, thymine-lysine conjugates have been identified as participants in the crosslinking in DNA-histone systems (Saito et al., 1983b Kurochkina et al., 1987). Thymine-cysteine conjugates have been shown to be produced in the UV-induced crosslinking of the gene 5 protein of fd phage to its corresponding DNA (Paradiso and Konigsberg, 1982). At the level of a nucleoside-peptide system, the single tyrosine contained in Angiotensin I was crosslinked to thymidine when a solution containing these two components was irradiated (Shaw et al., 1992). Similar results were obtained when thymidine, thymidine-5'-phosphate and thymidylyl-[3'-5']-2'-deoxyadenosine were irradiated in the presence of the tyrosine-containing heptad repeat peptide unit found in the largest subunit of the eukaryotic RNA polymerase II multiprotein complex (Connor et al., 1998a).

Photocrosslinking as a Tool for Structural Studies of DNA-Protein Complexes

DNA-protein crosslinking is a valuable tool for studying the structure of DNA-protein complexes. Since the only amino acids and nucleobases that can participate in crosslinking are those in contact in DNA-protein complexes, crosslinking can potentially be used to identify amino acids (or peptides) and nucleobases in those regions involved in binding protein to DNA. While steady state UV irradiation has been used in many studies, the use of pulsed lasers, in conjunction with mass spectrometry, has provided a powerful alternative approach to studying crosslinking, especially for examining contacts in native DNA-protein complexes. For example, this combination of techniques has been used to identify six crosslinked peptides in the complex formed between the single-stranded DNA binding domain of rat DNA polymerase ß, and the oligonucleotide d(ATATATA) (Connor et al., 1998b). [Experimental aspects of the use of crosslinking to study the structure of nucleic acid-protein complexes are discussed by Williams and Konigsberg (1992) (by steady state UV-induced crosslinking), and by Hockensmith et al. (1992) (by laser pulse-induced crosslinking). Reviews by Meisenheimer and Koch (1997), and Steen and Jensen (2002), provide information about, and references to, a number of studies in which the UV-induced crosslinking of nucleic acid-protein complexes has been used to gain structural information about such complexes.]

DNA-protein complexes in which the DNA component has been modified to contain 5-halogenated uracils or cytosines have also been studied using laser crosslinking-mass spectrometric approaches. For example, it has been shown that tryptophans 54 and 88 in the sequence of the E. coli single-stranded DNA binding protein can be bound to a DNA oligomer in which thymines are replaced with 5-iodouracil moieties (Steen et al., 2001). Nucleic acid-protein complexes, in which thymine has been replaced with 5-bromouracil or cytosine with 5-iodocytosine, have also found use in photochemical experiments designed to study contact regions in these complexes. [For reviews, see Meisenheimer and Koch, 1997 Steen and Jensen, 2002]

Biological Importance and Repair of DNA-Protein Crosslinks

Since the crosslinking of DNA and protein by UV radiation is many times more sensitive analytically than is thymine dimer formation, it was suggested that DNA-protein crosslinks may play a significant role in the inactivation of bacteria by UV radiation (Smith, 1962).

This hypothesis was subsequently proven by growing E. coli mutants under different conditions that affect cell sensitivity to UV radiation. A direct correlation was observed between the amount of DNA crosslinked to protein by a given dose of UV radiation, and the intrinsic sensitivity to killing by UV radiation under the several growth conditions studied (Smith et al., 1966).

In addition, the increased sensitivity of E. coli to killing by UV irradiation when frozen, and the variation in this sensitivity as a function of the temperature during irradiation, correlated with changes in the amount of DNA that was crosslinked to protein by UV irradiation. These variations in sensitivity to killing did not correlate with the production of thymine dimers (Smith and O'Leary, 1967).

These results on the biological importance of UV radiation-induced DNA-protein crosslinks are consistent with the fact that only about 60% of the survival of E. coli after UV irradiation can be photoreactivated (see module on Phototreactivation), i.e., 40% of lethality must be due to lesions other than cyclobutane pyrimidine dimers. DNA-protein crosslinks cannot be photoreactivated (Smith, 1964b).

DNA-protein crosslinks are repaired by postreplication repair, and they cause a longer delay in DNA synthesis than do pyrimidine dimers (Smith and Hamelin, 1977). [See module on Recombinational DNA Repair]

It should be noted that the formation of DNA-protein crosslinks in cellular systems is induced by stresses other than the absorption of radiation (e.g., chemically-induced reactions and crosslinking mediated by reactive oxygen species). Descriptions of relevant early work on other types of nucleic acid-protein crosslinks is given in Smith (1976), while a more recent review of the formation of some of these types of lesions, as well as their repair and biological consequences, is provided by Barker et al. (2005).

Summary and Conclusions

DNA-protein crosslinks are important lethal lesions in cells exposed to UV radiation. Crosslinks are particularly disruptive, as they occur mostly in the area of the chromosome that is undergoing replication. The structures of a number of adducts that are potentially responsible for crosslinks formed in UV-irradiated DNA-protein complexes have been determined. Surveys of the reactivity of thymine and uracil, as well as polynucleotides of the DNA and RNA nucleobases towards the photoaddition of amino acids have been conducted. Photocrosslinking is a useful tool to map DNA-protein contacts in DNA-protein complexes.

Barker, S, Weinfeld, M, Murray, D. (2005) DNA–protein crosslinks: their induction, repair, and biological consequences. Mut Res 589: 111-135.

Bonura, T, Town, CD, Smith, KC, Kaplan, HS. (1975) The influence of oxygen on the yield of DNA double-strand breaks in X-irradiated Escherichia coli K-12. Rad Res 63: 567-577.

Budowsky, EI, Simukova, NA, Turchinsky, MF, Boni, IV, Skoblov, YM. 1976. Induced formation of covalent bonds between nucleoprotein components. V. UV or bisulfite induced polynucleotide-protein crosslinkage in bacteriophage MS2. Nucleic Acids Res: 3: 261-276.

Connor, DA, Falick, AM, Shetlar, MD. 1998a. UV light-induced cross-linking of nucleosides, nucleotides and a dinucleotide to the carboxy-terminal heptad repeat peptide of RNA Polymerase II as studied by mass spectrometry. Photochem. Photobiol. 68: 1-8.

Connor, DA, Falick, AM, Young, MC, Shetlar, MD. 1998b. Probing the binding region of the single-stranded DNA-binding domain of rat DNA polymerase using nanosecond-pulse laser-induced cross-linking and mass spectrometry. Photochem. Photobiol. 68: 299-308.

Dietz, TM, Koch, TH. 1987. Photochemical coupling of 5-bromouracil to tryptophan, tyrosine and histidine, peptide like derivatives in aqueous fluid solution. Photochem. Photobiol. 46: 971-978

Dietz, TM, von Trebra, RJ, Swanson, BJ, Koch, TH. 1987. Photochemical coupling of 5-bromouracil (BU) to a peptide linkage. A model for BU-DNA photocrosslinking. J. Amer. Chem. Soc. 109: 1793-1797

Dorwin, EL, Shaw AA, Hom K, Bethel P, Shetlar, MD. 1988. Photoexchange products of cytosine and 5-methylcytosine with
N--acetyl-L-lysine and L-lysine. J. Photochem. Photobiol. B. 2: 265-278.

Greer, S. 1960. Studies on ultraviolet irradiation of Escherichia coli containing 5-bromouracil in its DNA. J. Gen. Microbiol. 22:618-634.

Hockensmith, JW, Kubasek, WL, Vorachek, WR, Evertsz, EM, von Hippel, PH. 1991. Laser cross-linking of protein-nucleic acid complexes. Methods Enzymol. 208, 211-236.

Janion, C. and Shugar D. 1967. Reaction of amines with dihydrocytosine analogs and formation of aminoacid and peptidyl dervivatives of diydropyrimidines. Acta Biochim. Pol. 14: 293-302.

Kaplan, HS, Smith, KC, Tomlon, PA. 1962. Effect of halogenated pyrimidines on radiosensitivity of E. coli. Rad. Research, 16:98-113.

Kurochkina, LP, Komissarov, AA, Kolomiitseva, GY. 1987. Localization of the lysine residue in histone H3 forming a thymine-lysine cross-link when deoxyribonucleoprotein is irradiated with UV light. Biochem. USSR, 52: 1457-1461.

Meisenheimer, KM, Koch, TH. 1997. Photocross-linking of nucleic acids to associated proteins. Crit. Rev. Biochem. Mol. Biol. 32: 101-140.

Myers, LS, Jr. 1976. Ionizing radiation-induced attachment reactions of nucleic acids and their components, in Aging, Carcinogenesis, and Radiation Biology (the role of nucleic acid addition reactions), (KC Smith, ed.), Plenum Press, New York, pp. 261-286.

Paradiso, PR, Konigsberg, W.,1982. Photochemical cross-linking of the gene-5 protein fd DNA complex from fd-infected cells. J. Biol. Chem. 257: 1462-1467 (and references therein).

Saito, I, Sugiyama, H, Ito, S, Furukawa, N, Matsuura, T., 1981. A novel photoreaction of thymidine with lysine. Photoinduced migration of thymine from DNA to lysine. J. Amer. Chem. Soc. 103:1598-1600.

Saito, I, Sugiyama, H, Matsuura, T. 1983a. Photoreaction of thymidine with alkylamines. Application to selective removal of thymine from DNA. J. Amer. Chem. Soc. 105: 956-962

Saito, I, Sugiyama H, Matsuura, T. 1983b. Isolation and characterization of a thymine-lysine adduct in UV-irradiated nuclei. The role of thymine-lysine photoaddition in photo-cross-linking of proteins to DNA. J. Amer. Chem. Soc. 105: 6989-6991

Saito, I, Sugiyama, H. 1990. Photoreactions of nucleic acids and their constituents with amino acids and related compounds, in Biooorganic Photochemistry (H. Morrison, ed), Vol. 1, John Wiley and Sons, New York, pp 317-340.

Schott, HN, Shetlar, MD. 1974. Photochemical addition of amino acids to thymine. Biochem. Biophys. Res. Comm. 59:1112-1116.

Shaw, AA , Falick, AM, Shetlar, MD. 1992. Photoreactions of thymine and thymidine with N--acetyltyrosine. Biochemistry 31: 10976-10983

Shetlar, MD. 1980. Cross-linking of proteins to nucleic acids by ultraviolet light. Photochem. Photobiol Rev. 5: 105-197.

Shetlar, MD, Christensen, J, Hom, K. 1984a. Photochemical addition of amino acids and peptides to DNA. Photochem. Photobiol. 39:125-133.

Shetlar, MD, Hom, K, Carbone, J, Moy, D, Steady, E, Watanabe, M. 1984b. Photochemical addition of amino acids and peptides to homopolyribonucleotides of the major bases. Photochem. Photobiol. 39: 135-140.

Shetlar, MD, Carbone, J, Steady E, and Hom K. 1984c. Photochemical addition of amino acids and peptides to polyuridylic acid. Photochem. Photobiol. 39: 141-144

Shetlar, MD, Hom, K. 1987. Mixed products of thymine and cysteine produced by direct and acetone sensitized photoreactions. Photochem. Photobiol. 45: 703-712.

Shetlar, MD, Rose, RB, Hom, K, Shaw, AA. 1991. Ring opening photoreactions of 5-bromouracil and 5-bromo-2'-deoxyuridine with selected alkylamines. Photochem. Photobiol. 53: 595-609.

Shetlar, MD, Chung, J. 2013. Ring-opening photoreactions of 5-methylcytosine with 3-mercaptopropionic acid and other thiols. Photochem. Photobiol. 89: 878-883.

Shetlar, MD, Hom, K., Venditto, VJ. 2013. Photohydrate-mediated reactions of uridine, 2'-deoxyuridine and 2'-deoxycytidine with amines at near neutral pH. Photochem. Photobiol. 89, 868-877.

Smith, KC. 1962. Dose dependent decrease in extractability of DNA from bacteria following irradiation with ultraviolet light or with visible light plus dye. Biochem. Biophys. Res. Commun., 8:157-163.

Smith, KC. 1964a. The photochemical interaction of deoxyribonucleic acid and protein in vivo and its biological importance. Photochem. Photobiol., 3:415-427.

Smith, KC. 1964b. Photochemistry of the Nucleic Acids, in Photophysiology (A.C. Giese, ed), Academic Press, New York, Vol. 2, pp. 329-388.

Smith, KC. 1969. Photochemical addition of amino acids to 14C-uracil. K.C. Smith, Biochem. Biophys. Res. Commun., 34:354-357.

Smith, KC. 1970. A mixed photoproduct of thymine and cysteine: 5-S-cysteine, 6-hydrothymine. Biochem. Biophys. Res. Commun., 39:1011-1016.

Smith, KC. 1976. Radiation-induced crosslinking of DNA and protein in bacteria, in Aging, Carcinogenesis, and Radiation Biology (the role of nucleic acid addition reactions), (KC Smith, ed.), Plenum Press, New York, pp. 67-81.

Smith, KC, Aplin, RT. 1966. A mixed photoproduct of uracil and cysteine (5-S-cysteine-6-hydrouracil). A possible model for the in vivo crosslinking of deoxyribonucleic acid and protein by ultraviolet light. Biochemistry 5:2125-2130.

Smith, KC, Hamelin, C. 1977. DNA synthesis kinetics, cell division delay, and post-repliction repair after UV irradiation of frozen cells of E. coli B/r. Photochem. Photobiol. 25:27-29.

Smith, KC, Hodgkins, B, O'Leary, ME. 1966. The biological importance of ultraviolet light induced DNA-protein crosslinks in Escherichia coli 15 TAU. Biochim. Biophys. Acta 114:1-15.

Smith, KC, Kaplan, HS. 1961. A chromatographic comparison of the nucleic acids from isologous newborn, adult, and neoplastic thymus. Cancer Res. 21:1148-1153.

Smith, KC, Meun, DHC. 1968. Kinetics of the photochemical addition of [35S]cysteine to polynucleotides and nucleic acids. Biochemistry 7:1033-1037.

Smith, KC, O'Leary, ME. 1967. Photoinduced DNA-protein cross-links and bacterial killing: A correlation at low-temperatures. Science 155: 1024-1026.

Steen, H, Jensen, ON. 2002, Analysis of protein-nucleic acid interactions by photochemical cross-linking and mass spectrometry. Mass Spectrometry Reviews 21: 163-182

Steen, H., Peterson, J., Mann, M. , Jensen, ON. 2001. Mass spectrometric analysis of a UV-cross-linked protein-DNA complex: Tryptophans 54 and 88 of E. coli SSB cross-link to DNA. Protein Sci.10: 1989-2001

Varghese, AJ. 1973. Properties of photoaddition products of thymine and cysteine. Biochemistry 12: 2725-2730.

Williams, KR, Konigsberg, WH. 1991. Identification of amino acid residues at interface of protein-nucleic acid complexes by photochemical crosslinking. Methods Enzymol. 208, 516-539.

[NOTE: Kendric Smith's papers cited here are available as PDF files.]

Reactive Groups for Protein Crosslinking

The following functional groups are available for crosslinking/labeling:

  • Primary amines (–NH2): This group exists at the N-terminus of each polypeptide chain (called the alpha-amine) and in the side chain of lysine (Lys, K) residues (called the epsilon-amine). Because of its positive charge at physiologic conditions, primary amines are usually outward-facing (i.e., on the outer surface) of proteins thus, they are usually accessible for conjugation without denaturing protein structure.
  • Carboxyls (–COOH): This group exists at the C-terminus of each polypeptide chain and in the side chains of aspartic acid (Asp, D) and glutamic acid (Glu, E). Like primary amines, carboxyls are usually on the surface of protein structure.
  • Sulfhydryls (–SH): This group exists in the side chain of cysteine (Cys, C). Often, as part of a protein's secondary or tertiary structure, cysteines are joined together between their side chains via disulfide bonds (–S–S–). These must be reduced to sulfhydryls to make them available for crosslinking by most types of reactive groups.
  • Carbonyls (–CHO): Ketone or aldehyde groups can be created in glycoproteins by oxidizing the polysaccharide post-translational modifications (glycosylation) with sodium meta-periodate.

A number of chemical reactive groups have been characterized and used to target the main kinds of protein functional groups. Some of the more popular groups are as follows:

  • Carboxyl-to-amine reactive groups: Carbodiimide
  • Amine reactive groups: NHS ester, Imidoester
  • Sulfhydryl reactive groups: Maleimide, Haloacetyl, Pyridyldisulfide
  • Aldehyde reactive groups: Hydrazide, Alkoxyamine
  • Photoreactive (i.e., nonselective, random insertion) groups: Diazirine, Aryl Azide
  • Hydroxyl reactive groups: Isocyanate

EDC and other carbodiimides are zero-length crosslinkers they cause direct conjugation of carboxylates (–COOH) to primary amines (–NH2) without becoming part of the final crosslink (amide bond) between target molecules. EDC crosslinking reactions must be performed in conditions devoid of extraneous carboxyls and amines. Because peptides and proteins contain multiple carboxyls and amines, direct EDC-mediated crosslinking usually causes random polymerization of polypeptides. Nevertheless, this reaction chemistry is used widely in immobilization procedures (e.g., attaching proteins to a carboxylated surface) and in immunogen preparation (e.g., attaching a small peptide to a large carrier protein).

NHS esters are reactive groups formed by EDC-activation of carboxylate molecules. NHS ester-activated crosslinkers and labeling compounds react with primary amines in slightly alkaline conditions (pH 7.2-8.5) to yield stable amide bonds. The reaction releases N-hydroxysuccinimide (MW 115), which can be removed easily by dialysis or desalting. Primary amine buffers such as Tris or TBS are not compatible because they compete for reaction however, in some procedures, it is useful to add Tris or glycine buffer at the end of a conjugation procedure to quench (stop) the reaction.

Sulfo-NHS esters are identical to NHS esters except that they contain a sulfonate (–SO3) group on the N-hydroxysuccinimide ring. This charged group has no effect on the reaction chemistry, but it does tend to increase the water-solubility of crosslinkers containing them. In addition, the charged group prevents sulfo-NHS crosslinkers from permeating cell membranes, enabling them to be used for cell surface crosslinking methods.

Imidoester crosslinkers react with primary amines to form amidine bonds. To ensure specificity for primary amines, imidoester reactions are best performed in amine-free, alkaline conditions (pH 10), such as with borate buffer.

Because the resulting amidine bond is protonated, the crosslink has a positive charge at physiological pH, much like the primary amine which it replaced. For this reason, imidoester crosslinkers have been used to study protein structure and molecular associations in membranes and to immobilize proteins onto solid-phase supports while preserving the isoelectric point (pI) of the native protein. However, the more stable and efficient NHS-ester crosslinkers have steadily replaced them in most applications.

Maleimide-activated crosslinkers and labeling reagents react specifically with sulfhydryl groups (–SH) at near neutral conditions (pH 6.5-7.5) to form stable thioether linkages. Disulfide bonds in protein structures (e.g., between cysteines) must be reduced to free thiols (sulfhydryls) to react with maleimide reagents. Extraneous thiols (most reducing agents) must be excluded from maleimide reaction buffers, because they will compete for coupling sites.

Short homobifunctional maleimide crosslinkers enable disulfide bridges in protein structures to be converted to permanent, irreducible linkages between cysteines. More commonly, the maleimide chemistry is used in combination with amine-reactive NHS-ester chemistry in the form of heterobifunctional crosslinkers that enable controlled, two-step conjugation of purified peptides and/or proteins.

Most haloacetyl crosslinkers contain an iodoacetyl or a bromoacetyl group. Haloacetyls react with sulfhydryl groups at physiologic to alkaline conditions (pH 7.2 to 9), resulting in stable thioether linkages. To limit free iodine generation, which has the potential to react with tyrosine, histidine and tryptophan residues, perform iodoacetyl reactions in the dark.

Pyridyl disulfides react with sulfhydryl groups over a broad pH range to form disulfide bonds. As such, conjugates prepared using these crosslinkers are cleavable with typical disulfide reducing agents, such as dithiothreitol (DTT).

During the reaction, a disulfide exchange occurs between the –SH group of the target molecule and the 2-pyridyldithiol group of the crosslinker. Pyridine-2-thione (MW 111 λmax 343nm) is released as a byproduct that can be monitored spectrophotometrically and removed from protein conjugates by dialysis or desalting.

Carbonyls (aldehydes and ketones) can be produced in glycoproteins and other polysaccharide-containing molecules by mild oxidation of certain sugar glycols using sodium meta-periodate. Hydrazide-activated crosslinkers and labeling compounds will then conjugate with these carbonyls at pH 5 to 7, resulting in formation of hydrazone bonds.

Hydrazide chemistry is useful for labeling, immobilizing or conjugating glycoproteins through glycosylation sites, which are often (as with most polyclonal antibodies) located at domains away from the key binding sites whose function one wishes to preserve.

Although not currently as popular or common as hydrazide reagents, alkoxyamine compounds conjugate to carbonyls (aldehydes and ketones) in much the same manner as hydrazides.

Photoreactive reagents are chemically inert compounds that become reactive when exposed to ultraviolet or visible light. Historically, aryl azides (also called phenylazides) have been the most popular photoreactive chemical group used in crosslinking and labeling reagents.

When an aryl azide compound is exposed to UV light, it forms a nitrene group that can initiate addition reactions with double bonds or insertion into C-H and N-H sites or can undergo ring expansion to react with a nucleophile (e.g., primary amine). Reactions can be performed in a variety of amine-free buffer conditions to conjugate proteins or even molecules devoid of the usual functional group "handles".

Photoreactive reagents are most often used as heterobifunctional crosslinkers to capture binding partner interactions. A purified bait protein is labeled with the crosslinker using the amine- or sulfhydryl-reactive end. Then this labeled protein is added to a lysate sample and allowed to bind its interactor. Finally, photo-activation with UV light initiates conjugation via the phenyl azide group.

Diazirines are a newer class of photo-activatable chemical groups that are being incorporated into crosslinking and labeling reagents. The diazirine (azipentanoate) moiety has better photostability than phenyl azide groups, and it is more easily and efficiently activated with long-wave UV light (330-370 nm).

Photo-activation of diazirine creates reactive carbene intermediates. Such intermediates can form covalent bonds through addition reactions with any amino acid side chain or peptide backbone at distances corresponding to the spacer arm lengths of the particular reagent. Diazirine-analogs of amino acids can be incorporated into protein structures by translation, enabling specific recombinant proteins to be activated as the crosslinker.

Optimization of Formaldehyde Cross-Linking for Protein Interaction Analysis of Non-Tagged Integrin

Formaldehyde cross-linking of protein complexes combined with immunoprecipitation and mass spectrometry analysis is a promising technique for analysing protein-protein interactions, including those of transient nature. Here we used integrin 1 as a model to describe the application of formaldehyde cross-linking in detail, particularly focusing on the optimal parameters for cross-linking, the detection of formaldehyde cross-linked complexes, the utility of antibodies, and the identification of binding partners. Integrin 1 was found in a high molecular weight complex after formaldehyde cross-linking. Eight different anti-integrin 1 antibodies were used for pull-down experiments and no loss in precipitation efficiency after cross-linking was observed. However, two of the antibodies could not precipitate the complex, probably due to hidden epitopes. Formaldehyde cross-linked complexes, precipitated from Jurkat cells or human platelets and analyzed by mass spectrometry, were found to be composed of integrin 1,

4 and 6 or 1, 6, 2, and 5, respectively.

1. Introduction

Protein-protein interactions are the basis for most cellular processes, including signalling, protein synthesis, and metabolism. Detailed knowledge of these protein networks is required in order to better understand diseases and develop adequate treatments. At present, protein-protein interactions are commonly investigated by yeast-two-hybrid approaches [1] and by in vitro binding studies [2]. However, these approaches are prone to false positive identifications because they do not take into account the temporal and local separations that occur in a living system. One tool to study protein-protein interactions in a physiological context is affinity enrichment of the protein of interest followed by detection of its binding partners using either immunodetection methods or mass spectrometry [3]. However, this classical immunoprecipitation method has two drawbacks. Weak interactions could be missed, if stringent wash conditions are applied. In contrast, nonstringent conditions may enable the identification of more proteins, but many of these could be false positives only binding the bait protein during sample preparation.

One approach to solve this problem is applying covalent cross-linking to intact cells and thereby stabilizing protein-protein interactions, including very weak and transient ones [3]. After this fixation step, highly stringent conditions can be used during cell lysis and affinity enrichment, minimizing the risk of identifying false positives. Several cross-linkers varying in spacer arm lengths, reaction groups, and other properties are commercially available. One of the shortest available cross-linkers is formaldehyde (2.3–2.7 Å), which has been used for a long time in histology and pathology to “freeze” the native state of tissues and cells [4]. The experimental conditions used in these applications lead to a very tight network of cross-links, which prevents the precipitation of one protein of interest as required for protein-protein interaction studies. However, lower formaldehyde concentrations (0.4–2% instead of 4%) and especially shorter reaction times (minutes instead of hours) allow the utilization of formaldehyde as a cross-linker to analyze protein-protein interactions as shown by us and others [5–8].

The application of formaldehyde as a cross-linker has several advantages. Only closely associated proteins can be cross-linked due to the small size of formaldehyde. Furthermore, its high permeability towards cell membranes enables cross-linking in the intact cell, without addition of organic solvents such as dimethyl sulfoxide as necessary for other cross-linkers. Formaldehyde is also thought to allow very fast cross-linking and the stabilization of transient interactions [4]. Finally, formaldehyde is available in almost every laboratory at costs that amount to only a fraction of other cross-linkers. However, formaldehyde cross-linking is not yet an established standard method and many questions regarding the optimal experimental conditions and the usability of antibodies for pull-down of proteins after formaldehyde treatment remain. For example, epitopes recognized by antibodies raised against endogenous proteins could be destroyed by formaldehyde modification, which would prevent their application [9]. Similarly, the physiological environment of a protein of interest, and the type and extent of its interactions may also affect the experimental outcome. Therefore, we decided to investigate different aspects of formaldehyde cross-linking in more detail using the transmembrane protein integrin

Integrins are membrane spanning heterodimeric complexes that play important roles in cell adhesion and migration processes by interacting with components of the extracellular matrix [10]. Each integrin heterodimer is composed of one α and one subunit, which are noncovalently associated. 18 subunits and 8 subunits are found in humans, which form 24 different heterodimers. The biggest subgroup with 12 members is formed by containing heterodimers [11]. Before being able to bind a ligand, integrins have to be activated through an intracellular process termed inside-out signalling. For example, during platelet activation, thrombin triggers talin activation via a pathway involving protein kinase C, the small GTPase Rap1 and the Rap1 effector Rap1-interacting molecule (RIAM) [12]. Activated talin then binds to the intracellular tail of the integrin and causes conformational change of the two integrin chains. This allows binding of extracellular ligands, which drives the cytoplasmic tail of the integrin to bind additional adaptor proteins, establishes a connection to the cytoskeleton and leads to the delivery of the external signal (outside-in signalling).

Several intracellular interaction partners have been described for integrins, despite the shortness of the intracellular tail of integrins, which varies between 40 and 60 amino acids. 25 adaptor proteins have been reported for integrin , including talin, tensin, filamin and kindlin [13]. However, these interactions cannot take place simultaneously, but depend on the activation status of the cell and the integrin. The detailed binding procedures as well as the signalling processes triggered by these are not fully understood. For example, talin and kindlin both interact with integrin and a crosstalk between them is assumed. However, it remains unclear, whether both proteins connect with the integrin at the same time or binding occurs sequentially [14]. Studying the interaction partners of integrins using the formaldehyde cross-linking approach, which should be able to identify transient and indirect interaction partners of proteins, may shed more light on these processes and would therefore be very valuable.

In the present study, we report the optimization of a protocol applying formaldehyde cross-linking combined with immunoprecipitation and mass spectrometry (Figure 1(a)) to analyze the interaction network of integrin .

Formaldehyde cross-linking. (a) Workflow of formaldehyde cross-linking. Cells are treated with formaldehyde, lysed and protein complexes (oval shapes) are precipitated by antibodies (

-shaped). Cross-links are indicated by black triangles. Only antibodies, whose epitopes are not destroyed during formaldehyde modification, can precipitate the complex. (b) Reaction scheme of formaldehyde modification, cross-linking and cross-linking reversal. (c) and (d) Formaldehyde derived cross-links are preserved, if samples are only incubated at

C, whereas most of the cross-links are reversed at

C. (c) Schematic model. Proteins are depicted as oval shapes, formaldehyde cross-links as black triangles. (d) Anti-integrin

2. Materials and Methods

2.1. Cells and Reagents

Jurkat cells were grown in Dulbecco’s modified medium (GIBCO, high glucose) containing 10% fetal bovine serum (GIBCO), L-glutamine and penicillin/ streptomycin. Human platelets were isolated from healthy human volunteers as described earlier [15]. This was approved by the University of British Columbia Research Ethics Board and informed consent was granted by the donors. Briefly, whole blood was drawn from the antecubital vein into 0.15% (v/v) acid-citrate-dextrose anticoagulant. Platelets were isolated by centrifugation and washed in physiological buffer. All anti-integrin antibodies were monoclonal mouse anti-human antibodies provided by John Wilkins (Manitoba). Goat anti-mouse Alexa Fluor 680 was obtained from Molecular Probes and goat anti-mouse HRP was received from BioRad.

2.2. Modeling of Integrin Structure

The integrin structure was modeled on the structure of human integrin

[16] using SWISS MODEL [17, 18] and visualization was performed using the Swiss-Pdb Viewer Deep View (

2.3. Formaldehyde Cross-Linking

Formaldehyde solution was obtained by dissolving 0.4% to 4% paraformaldehyde (Fisher Scientific) in PBS for 2 h at

C. The solution was filtered (0.22

m), stored in the dark at RT and discarded after 4 weeks. For cross-linking, Jurkat cells were pelleted in a 50 ml reaction tube, resuspended in PBS and counted. Cells were centrifuged again and resuspended to

cells/ml in formaldehyde solution. Cells were incubated with mild agitation for 7 min at RT and then pelleted at 1800 g and RT for 3 min, resulting in 10 minutes exposure to formaldehyde. The supernatant was removed and the reaction was quenched with 0.5 ml ice-cold 1.25 M glycine/ PBS. Cells were transferred to a smaller tube, spun, washed once in 1.25 M glycine/PBS and lysed in 1 ml RIPA buffer (50 mM Tris HCl, pH 8.0, 150 mM sodium chloride, 1% NP40, 0.5% sodium deoxycholate, 0.1% SDS, 1 mM EDTA, protease inhibitors (Complete mini, EDTA-free, Roche Diagnostics)) per

cells for 60 minute on ice. After 30 minutes, cell lysates were treated with 50 strokes using a Dounce homogenizer. Lysates were spun for 30 minutes at 20000 g and

C to remove insoluble debris. The supernatant was either used directly or stored at −8 C. Control cells were treated exactly the same way, except that they were resuspended in PBS instead of formaldehyde solution. When platelets were used for cross-linking,

cells were resuspended in 10 ml formaldehyde solution and lysed in 1 ml RIPA buffer.

2.4. Immunoprecipitation, Western Blot Analysis and Silver Staining of SDS PAGE

Protein concentration was determined using a BCA assay (Pierce). The indicated amounts of antibodies and lysates were incubated for 1 h, after that 10 to 25 l protein G agarose beads (Immobilized Protein G, Pierce) were added and immunoprecipitation was performed overnight. All steps were performed with mild agitation at C. For mass spectrometric analysis, lysates were precleared by incubation with the same amount of beads for 2 h before the antibody was added. The supernatants of the immunoprecipitations were kept for analysis and the beads were washed either twice with PBS for western blot analysis or three times with RIPA buffer for mass spectrometric analysis. 4x reducing SDS Loading Dye (500 mM Tris HCl, pH 6.8, 8% SDS, 40% glycerine, 20% β-mercaptoethanol, 5 mg/ml bromophenol blue) was added to the beads as well as to the lysates and the supernatants and samples were incubated at either C or C for 5 min or 10–20 min, respectively, before they were separated by SDS PAGE using a 8% Laemmli gel [19]. For western blot analysis, proteins were transferred via a semi-dry procedure on polyvinylidene difluoride membranes (Pall Corporation), blocked for 1 h at RT with 5% milk powder in PBST (PBS, 0.1% Tween 20) and incubated with JB1A (0.1 g/ml) overnight at C. Membranes were either incubated with anti-mouse HRP or anti-mouse Alexa Fluor 680 (both 1 : 10000) for 1 h at RT. Membranes treated with anti-mouse Alexa Fluor 680 were scanned with a fluorescence scanner (Odyssey, LICOR) using excitation/emission wavelengths of 700 and 800 nm, whereas signals on anti-mouse HRP incubated membranes were detected with chemiluminescence solution (ECL, GE Healthcare). Quantitation of immunoblot analysis was performed using the software ImageJ [20]. Silver staining was performed using a modified protocol of Gharahdaghi et al. [21]. Briefly, gels were fixed with 10% acetic acid/10% methanol, washed with water and sensitized with 1 g/ml dithiothreitol for 15 min. Gels were incubated in 0.1% (w/v) silver nitrate for 15 min and developed with 0.02% (w/v) paraformaldehyde in 3% (w/v) potassium carbonate until the desired staining had occurred. The reaction was stopped by addition of acetic acid.

2.5. In Gel Digestion and LC-MS/MS Analysis

Bands of interest in silver-stained gels were excised and destained as described [21]. Briefly, gel bands were incubated in a freshly prepared destaining solution (15 mM potassium ferricyanide/50 mM sodium thiosulfate) until the colour disappeared, washed several times with water and ammonium bicarbonate (pH 8.0) and chopped in smaller pieces. In-gel digestion was performed using standard procedures [22]. Briefly, samples were reduced with dithiothreitol, alkylated with iodoacetamide and digested with trypsin (Promega) in ammonium bicarbonate overnight. Peptides were extracted twice from the gel pieces, dried down in a vacuum centrifuge, reconstituted in 5% formic acid and STAGE-tip purified [23]. Separation and identification of peptides was performed by nano-HPLC MS/MS on an Agilent 1100 (Agilent, Santa Clara, CA) coupled to an FT-ICR (LTQ-FT, Thermo Electron Corporation, Waltham, MA) using a 15 cm long, 75 m I.D. fused silica column packed with 3 m particle size reverse phase (

) beads (Dr. Maisch GmbH, Germany) with water:acetonitrile:formic acid as the mobile phase with gradient elution. Proteins were identified by extracting the Mascot generic format (MGF) files from the MS data using DTA Super Charge (part of the MSQuant open source project ( and searching them against the ENSEMBL database with the X!Tandem algorithm embedded in the Global Proteome Machine ( The protein expect score (log(e)) indicates the probability of false assignment of a protein: log(e) = −1 equates to a 1 in 10, log(2) = −2 to a 1 in 100 chance of a stochastic protein assignment. Known contaminants such as keratin were removed from the protein lists.

3. Results and Discussion

3.1. Formaldehyde Cross-Linking Conditions

The usage of formaldehyde as cross-linking reagent has to be evaluated in order to determine the optimal balance between highest yield of complex formation and lowest artefact generation [4]. Optimization has to be performed for each protein of interest, as it is dependent on the physiological environment of the protein itself and can vary for example between cytosolic and membrane proteins. Three main parameters play a critical role during formaldehyde cross-linking: the reaction temperature, the incubation time and the formaldehyde concentration. The temperature dependency had been studied in our laboratory earlier and a difference between incubation at C and

C could not be detected (unpublished results). Room temperature is advantageous, as it results in the most convenient and easiest approach possible. In addition, we chose not to increase the incubation time to more than 10 minutes, as the advantage of using formaldehyde as a cross-linker is the short reaction time it requires, which minimizes the formation of unspecific cross-links and allows the fixation of transient interactions. Moreover, model studies on peptides had shown that incubation time and formaldehyde concentration are complementary [24]. Therefore, we decided to limit our study to the usage of different concentrations of formaldehyde in terms of our model protein integrin complex.

Jurkat cells were chosen for studying integrin interactions, as this human

cell line expresses high amounts of this integrin and has been used extensively for its investigation before [25]. Different concentrations of formaldehyde (0.4% to 2%) were used to cross-link Jurkat cells. The lowest concentration was chosen as it had been shown earlier to result in the best protein loss/cross-linking yield balance [4], the highest was twice as high as formaldehyde concentrations shown to be successful earlier [5]. Cells were lysed under stringent conditions using RIPA buffer to destroy weak and noncovalent interactions, and protein amounts were determined. Lysates of formaldehyde treated cells contained lower amounts of protein than nontreated cells. This can be explained by the formation of insoluble complexes, for example, nuclear proteins being cross-linked to DNA, which were precipitated during lysis and removed in the insoluble pellet. This effect was visible during sample generation: lysis of nontreated cells using the stringent RIPA buffer led to the release of DNA, which formed a cloudy precipitate and could be easily removed. In contrast, a cloudy suspension was generated during lysis of formaldehyde treated cells and the pellet had a different consistency, which required a longer centrifugation period to become separated. Consistent with this observation, nuclear proteins were not detected in the lysate by immunoblot analysis. However, membrane proteins were overrepresented due the loss of nuclear proteins (data not shown). We recommend using this difference in appearance as an early indication of successful cross-linking.

3.2. Detection of Formaldehyde Cross-Linked Integrin Complexes

Optimization of the formaldehyde cross-linking protocol involving integrin required a read-out of the cross-linking efficiency, for example, the detection of a complex containing the integrin. Formaldehyde cross-links are reversible during the standard sample preparation for SDS PAGE analysis (Figure 1(b)), which includes boiling in reducing Laemmli buffer [19], thus cross-linked complexes would not be detected under these conditions [8]. However, by reducing the incubation temperature to C, cross-linked complexes are not fully destroyed and remain detectable (Figure 1(c)) [5]. Membrane protein studies by gel electrophoresis are often performed at even lower temperatures (37–4 C). Initial immunoprecipitation experiments we had performed indicated that at C the antibodies used for pull-downs would not dissociate into their heavy and light chains (data not shown). Instead, during gel electrophoresis the intact antibodies would migrate at molecular weights that would overlap with the cross-linked complexes and therefore interfere with their detection. Consequently, we decided not to use temperatures lower than C in our experiments.

Jurkat cells treated with 2% formaldehyde were used to confirm the detection of cross-linked complexes. Lysates were incubated at C and C, respectively and analyzed by western blot using the antibody JB1A, which had been used for detection of integrin β1 in earlier studies [26]. We could recognize a higher molecular weight complex containing integrin in samples treated for 5 min at C, whereas this band was nearly undetectable after boiling for 10 minutes at C (Figure 1(d)). Therefore, we concluded that we were visualizing a cross-linked complex containing integrin . However, integrin was also found in the monomeric form at appr. 150 kDa after incubation at C. This could be due to incomplete cross-linking, as the conditions applied during formaldehyde cross-linking do not lead to a high extent of protein cross-linking, leaving a large fraction of integrin noncross-linked. Alternatively, incubation at C may lead to partial reversal of formaldehyde cross-links and release of integrin even at a lower temperature.

3.3. Balance of Cross-Linking Efficiency and Protein Loss

Equal amount of lysates of the samples generated using different concentrations of formaldehyde were analyzed by immunoblotting (Figure 2(a)). No complex was detected in the control, in which cells were treated under cross-linking conditions but without formaldehyde. With 0.4% formaldehyde, a small amount of complex was detected, which increased with higher formaldehyde concentrations. At 0.8% formaldehyde, much more complex was visible. The amount increased slightly at higher concentrations of formaldehyde, but then settled, with no apparent difference between 1.2 and 2%. The signal intensities of monomeric integrin and the complex were quantified, the values at 2% formaldehyde were set to 100% and relative intensities were calculated. These were plotted together with the total protein concentrations of the lysates to determine the optimal cross-linking parameters (Figure 2(b)). The amount of monomeric integrin did not vary significantly between the different formaldehyde concentrations, even though the amount of the complex was increasing (Figure 2(b)). This apparent contradiction can be explained by the aforementioned observation that membrane proteins are enriched during formaldehyde treatment relative to other cellular components. Thus, by loading equal total protein amounts in each lane, increasingly higher amounts of total integrin were applied. Unfortunately, this variation cannot be compensated using loading controls, as the exact amount of each cross-linked protein cannot be predicted. The decrease in protein concentration due to formaldehyde modification was most pronounced between 0% and 0.4% formaldehyde treatment, increased at 0.8% but stabilized at higher formaldehyde concentrations (Figure 2(b)). Comparing the loss of protein to the gain of integrin complex (Figure 2(b)) implied that using a formaldehyde concentration between 1 and 2% should lead to the best cross-linking efficiency/protein loss balance, without major differences in this range. However, increasing the extent of cross-linking may also result in the formation of larger complexes by involving proteins that do not directly bind to integrin , including the cytoskeleton. This would lead to the formation of more extensive, heterogeneous cross-linked complexes that would provide a larger surface to which nonspecific proteins can bind during sample processing. As a result, an increasing amount of abundant cytoskeletal proteins and common contaminants would be identified that would not be considered specific interactors. Therefore, to minimize such apparent artefact generation, we decided to perform the following investigations using more stringent conditions by applying 0.4% formaldehyde. This increases the likelihood of missed identifications of specific interactors of low abundance and of interactions of low stoichiometry. Even though higher formaldehyde concentrations would counter this effect, they would also result in more artefacts, hence differential analysis using multiple formaldehyde concentrations would still not be able to distinguish between specific interactors and artefacts. Instead, individual follow-up experiments for each identified protein would be required to determine its specificity. As the focus of this study was not to obtain an extensive list of putative interactors, but rather to demonstrate the general validity of the approach, we chose lower formaldehyde concentrations and high stringency. Users interested in maximizing the number of captured proteins should consider using higher formaldehyde concentrations instead.



Formaldehyde cross-linking is an important component of many technologies, including chromatin immunoprecipitation and chromosome conformation capture. The procedure remains empirical and poorly characterized, however, despite a long history of its use in research. Little is known about the specificity of in vivo cross-linking, its efficiency and chemical adducts induced by the procedure. It is time to search this black box.

We think it is urgent to draw attention to the uncertainty introduced in results obtained by ChIP and other formaldehyde fixation-based approaches by the fact the cross-linking efficiency of various proteins to DNA and to each other is drastically different and, in the case of in vivo cross-linking, may depend on local conditions within different cellular compartments.

Current chromatin research is characterized by the fast accumulation of genome-wide data on the distribution of various regulatory proteins along chromosomes. These data are easily accessible through different databases, and much effort has been made to pour more and more data into the pot. Surprisingly, however, not many scientists are concerned about the validity of the chromatin immunoprecipitation (ChIP) approach. The ChIP procedure was developed >15 years ago [ 1] and, essentially, the original protocol is still used without paying much attention to its inherent problems, even though it has long been felt that ‘the devil is in the ChIP details’. The most problematic step is formaldehyde fixation. It is commonly believed that formaldehyde can fix any DNA–protein complex. However, this assumption is far from being universally verified. For example, the lac repressor cannot be fixed to DNA by formaldehyde, even though its DNA-binding domain contains a number of basic amino acid residues [ 2]. The same was reported for NF-κB [ 3]. To specialists in the field, there are chromatin components that are proverbially difficult to cross-link, and specific protocols have been elaborated to solve this problem in some individual cases, mainly in an empiric way (see for instance [ 4]). It was established that there is a temporal threshold for cross-linking reactions such that once the residence time of a protein drops to <5 s, it becomes ‘invisible’ to formaldehyde cross-linking [ 5]. The formaldehyde fixation procedure remains in fact empirical, and little is known about the specificity of in vivo cross-linking, its efficiency and the chemical adducts induced by this procedure. Therefore, scientists performing cross-linking experiments are actually flying blind, and this can cause major problems in data interpretation [ 6, 7].

In recent years, methods (3C, 4C, Hi-C, ChIA-PET, etc.) based on the chromosome conformation capture (3C) procedure [ 8] have been widely used to study promoter–enhancer interactions and other questions related to the 3D architecture of the genome [ 9]. The 3C protocol is based on the assumption that DNA–protein complexes assembled in living cells can be fixed by formaldehyde, and then, after DNA cleavage by restriction enzymes, the complexes containing remote regulatory sequences linked by protein bridges can be solubilized and subjected to different treatments in solution. Recent studies from our groups have shown that this is not the case. Instead, formaldehyde fixation produces a rigid network of chromatin fibers that survives treatment with sodium dodecyl sulphate and restriction enzymes. Although this cross-linked chromatin network can be disrupted by sonication, many otherwise detectable contacts between DNA regulatory elements, such as promoters and enchancers of beta-globin genes, appear to be lost after such treatment [ 10–12]. These results argue that, in living cells, cross-linking of genomic elements via bridges made by regulatory proteins may be a relatively rare event in comparison with cross-linking of chromatin fibers via histones. This may reflect both infrequent juxtaposition of enhancers and promoters and inefficiency of formaldehyde cross-linking. Indeed, there are known examples demonstrating that enhancer–promoter interactions captured by 3C methods do not correlate with colocalization of these elements in vivo assayed by microscopy [ 13]. On the other hand, formaldehyde was reported to be inefficient for cross-linking of proteins that are not directly bound to DNA, such as transcriptional coactivators and corepressors [ 14, 15].

These observations prompt further questions: ‘are there differences between the efficiency of cross-linking in euchromatin and heterochromatin, and if so how do they correlate with genome-wide 3C data?’ For instance, Sanyal et al. reported higher 5C contact frequency in open chromatin regions [ 16], but it is unclear whether this result may be partly due to technical limits of 3C technology favoring cross-linking of open chromatin. In another study from the same laboratory, it was shown that in mitotic chromosomes both the large-scale spatial segregation and topologically associating domains were lost [ 17], but as of today, one cannot exclude the possibility that this pattern may be partly due to inefficient formaldehyde cross-linking of the highly condensed chromatin of mitotic chromosomes. On the other hand, the ability to detect 3C contacts appears to be dependent on the preservation of the architecture of unlysed nuclei [ 10]. As there is no nucleus in mitosis, it may be the absence of this architecture/nuclear compartments that may underlie the absence of 3C contacts in mitosis. Even more worrying results were obtained from the ChIP-seq analysis of distribution of the Silent information regulator (Sir) complex in Saccharomyces cerevisiae. The authors of this study discovered artifactual enrichment of multiple unrelated proteins, including the entire silencing complex, at highly expressed genes, calling into question the results of some previously published ChIP studies [ 6]. The observed phenomenon is most likely related to the existence of so-called high-occupancy target regions or ‘hotspots’ at which many DNA-binding proteins display a signal of enrichment despite the absence of an in vitro binding site in the underlying DNA sequence [ 18].

The chemistry of formaldehyde cross-linking is well known [ 1, 19], but it is the in vivo aspects of the technique that remain obscure. For example, formaldehyde fixation was reported to trigger DNA damage response and massive poly(ADP-ribosyl)ation of nuclear proteins, thus changing the chromatin composition and introducing bias in ChIP analysis [ 7]. The cross-links formed by formaldehyde treatment are fully and easily reversible by heating and a drop in pH, allowing for further analyses of both proteins and DNA. At the same time, the temperature and pH dependence of the cross-linking reaction raises a question of the stability of DNA–protein complexes obtained under different conditions in different applications. It is possible that minor variations in the conditions under which the cross-linking is performed can substantially affect the efficiency of cross-linking and/or the stability of the cross-linked products. This may apply not only to whole cells but also to local compartments within cells that may be embedded in different physicochemical microenvironments at the nanoscale of chromosome domains. It is thus not clear to what extent ChIP profiles reflect the distribution of the protein under study, and to what extent the local cross-linking conditions. The history of research shows remarkable examples of opposite conclusions made based on the results of cross-linking performed in slightly different ways [ 20].

We therefore wish to draw the attention of researchers to the necessity of reconsidering the basic steps of commonly used experimental protocols. As far as in vivo formaldehyde cross-linking is concerned, it is certainly time to upend some widely accepted assumptions. All evidence shows that formaldehyde cross-linking can no longer be used as if it was the molecular biology panacea. It is time to search this black box: investigate its in vivo molecular biology, its consequence on data collection in ChIP-seq and ‘C’ technologies, its limits, as well as to explore possible improvements and alternatives.

Formaldehyde cross-linking is commonly used to probe chromatin structure but remains a poorly understood ‘black box’ technology.

Enzymes and Nucleic Acids

(d) Formaldehyde.

Formaldehyde reacts with free amino groups of nucleoside bases, forming methylol or Schiff base derivatives and consequently denaturing dsDNA. Formaldehyde-treated, immobilized DNA can be hybridized with oligonucleotide probes with efficiencies 5–10 times greater than untreated DNA. Formaldehyde reacts initially with DNA at the single-stranded regions that occur during the “breathing” of the double helix. As the chemical modification progresses, optimally at 6× SSC, 10% HCHO, and 60°C for 20–30 min with free DNAs, the double helix eventually collapses, further accelerating the DNA-HCHO reaction. Formaldehyde adducts are stable in neutral buffers in the presence of formaldehyde and prevent DNA from reassociation. Removal of the reagent from the buffer during the (pre)hybridization steps leads to regeneration of the amino groups, thereby allowing unimpeded, efficient hybridization. Formaldehyde can be used on RNA in a similar manner to break temporarily and/or to prevent secondary structures.


The regulation of gene transcription is fundamental to the existence of complex multicellular organisms such as humans. Although it is widely recognized that much of gene regulation is controlled by gene-specific protein-DNA interactions, there presently exists little in the way of tools to identify proteins that interact with the genome at locations of interest. We have developed a novel strategy to address this problem, which we refer to as GENECAPP, for Global ExoNuclease-based Enrichment of Chromatin-Associated Proteins for Proteomics. In this approach, formaldehyde cross-linking is employed to covalently link DNA to its associated proteins subsequent fragmentation of the DNA, followed by exonuclease digestion, produces a single-stranded region of the DNA that enables sequence-specific hybridization capture of the protein-DNA complex on a solid support. Mass spectrometric (MS) analysis of the captured proteins is then used for their identification and/or quantification. We show here the development and optimization of GENECAPP for an in vitro model system, comprised of the murine insulin-like growth factor-binding protein 1 (IGFBP1) promoter region and FoxO1, a member of the forkhead rhabdomyosarcoma (FoxO) subfamily of transcription factors, which binds specifically to the IGFBP1 promoter. This novel strategy provides a powerful tool for studies of protein-DNA and protein-protein interactions.

Citation: Wu C-H, Chen S, Shortreed MR, Kreitinger GM, Yuan Y, Frey BL, et al. (2011) Sequence-Specific Capture of Protein-DNA Complexes for Mass Spectrometric Protein Identification. PLoS ONE 6(10): e26217.

Editor: Eliana Saul Furquim Werneck Abdelhay, Instituto Nacional de Câncer, Brazil

Received: August 4, 2011 Accepted: September 22, 2011 Published: October 20, 2011

Copyright: © 2011 Wu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was supported by the Wisconsin Center of Excellence in Genomics Science through the National Human Genome Research Institute, National Institutes of Health [1P50HG004952]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.

DNA–protein crosslinks

DPCs originate when proteins become crosslinked to DNA after exposure to physical or chemical agents, such as UV light or aldehydes, respectively (non-enzymatic DPCs), or as a result of faulty enzymatic reactions (enzymatic DPCs) 8 . Enzymatic DPCs are well exemplified by Topoisomerase-1 and Topoisomerase-2 cleavage complexes (Topo-1ccs, Topo-2ccs). During the physiological reaction of Topoisomerase on DNA, a transient, covalent intermediate (i.e., cleavage complex) forms between the catalytic tyrosine residue and the DNA phosphate group (phosphotyrosyl linkage). Stabilization of the cleavage complex (and formation of a DPC) can happen spontaneously if DNA is damaged, but is enhanced in the presence of poisons, e.g., camptothecin (CPT) or etoposide, for Topo-1 or Topo-2, respectively 9,10 . Notably, Topo-1/2 poisons are widely exploited in cancer chemotherapy 2,11 . Enzymatic DPCs also include crosslinks of DNMT1 (DNA methyltransferase 1) to the DNA methylation inhibitor 5-aza-2’-deoxycytidine (5azadC) incorporated into DNA 12,13 , and of HMCES (5-Hydroxymethylcytosine binding, ES-cell-specific) to abasic sites in single-stranded DNA 14 .

In the case of non-enzymatic DPCs, virtually any protein—of variable size, structure and nature—in the vicinity of DNA can be crosslinked. One of the most potent crosslinkers, formaldehyde (FA), is heavily present in the environment and produced endogenously via processes like lipid peroxidation, and DNA, RNA or histone demethylation 15,16,17,18,19 . Hence, FA release can occur in the surroundings of DNA, implying that DPCs form continuously and cells must constantly overcome DPC-induced toxicity.

Defective DPC repair leads to sensitivity to crosslinking agents, faulty DNA replication and cell cycle abnormalities, which pave the way for chromosomal instability and carcinogenesis in humans and mice 20 . Hence, multiple pathways work to ensure a proper response to these insults. Nuclease-dependent mechanisms, like nucleotide excision repair (NER) and homologous recombination, operate in both bacteria and eukaryotic cells by excising the DNA flanking the DPC 21,22,23,24,25 . However, NER seems to have a fairly limited role in overall DPC repair, as it can only remove small DPCs (8–10 kDa in size in mammalian cells) 5,23,26 . Topo-1/2ccs can be excised by dedicated tyrosyl-DNA phosphodiesterases, TDP1 and TDP2, which cleave the phosphotyrosyl linkage 27 this is normally shielded and becomes accessible to TDPs after partial proteolysis or structural changes of Topoisomerases 28,29,30,31,32 .

A much more pliable, less discriminate way to process DPCs is through proteolysis of the protein component operated by specialized DPC proteases, namely DPC proteolysis repair.

Present address: State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, School of Life Sciences, Xiamen University, 361005, Xiamen, Fujian, China

Present address: Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, School of Pharmaceutical Sciences, Chongqing University, 401331, Chongqing, China

Present address: Chinese Institute for Brain Research, 102206, Beijing, China

These authors contributed equally: Rongfeng Zhu, Gong Zhang.


Synthetic and Functional Biomolecules Center, Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871, Beijing, China

Rongfeng Zhu, Gong Zhang, Yu Han, Jiaofeng Li, Jingyi Zhao & Peng R. Chen

Peking-Tsinghua Center for Life Sciences, 100871, Beijing, China

Miao Jing, Yulong Li & Peng R. Chen

State Key Laboratory of Membrane Biology, PKU-IDG/McGovern Institute for Brain Research, School of Life Sciences, Peking University, 100871, Beijing, China

Key Laboratory of Bioorganic Chemistry and Molecular Engineering of Ministry of Education, Peking University, 100871, Beijing, China

Watch the video: What Does Formaldehyde Smell Like? (February 2023).