Information

Are there any major noticeable limitations to genome sequence compression methods that use reference templates?

Are there any major noticeable limitations to genome sequence compression methods that use reference templates?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Recently, I have been researching about big data analytics in biochemistry, and started wondering about how genome sequence compression could affect analysis.

Of all the method listed on the Wikipedia page, the reference template method is my favourite as, it not only seems effective, but also was the idea that popped into my head before I did any study regarding this topic.

But, before I implement it in a project I am working on, I wanted to know if there are any common and obvious drawbacks/limitations that the bioinformatics industry faces quite often when using this scheme.


The most obvious drawback of the reference template method is that if you use it, you've already analysed data.

Usually, you get data from wet biologists. They sequence samples and upload sequences to a server. Often an uploading process is already automated. If you sequence in-house, the data is already on your server. If not - biologists do not want to do an extra job. Moreover, if it is not a big project you do not have that much data that transferring and storing it is the problem. So you'll get the data as *.fastq.gz files.

Then you make QC, align reads and make an SNV-calling or expression analysis or whatever you want. Aligning and SNV-calling could be painful, complicated (Copy Number Variation, heterozygosity etc) and time-consuming. It is the problem. Resulting data usually is stored as Variant Call Format (VCF). Under the influence of CRAM and 1K Genome Project which uses it, VCF has been developed.


The biggest drawback is that reference template cannot address the question of genome sequence compression. It addresses variant information compression.

Granted, a lot of the interest in genomics has been addressed towards variants, basically because they are easier to analyze than actual whole genomes. Additionally, when many people say "genomics" or "genome sequence" what they mean is "human genomic variation", basically because there is more money to be made with human genomics than with other genomes; and therefore most people are really talking about human genomic variants when they talk about "genomes".

Even in the variant case, templating approaches stop working very well when you have structural variants of the genome, or at least they become extremely complex. More recent advances like genome graphs can address some of the biggest issues, such as structural variants, but they still have not very well defined (AFAIK) limits in terms of how divergent the genomes can be before they break down.

But leaving that aside, any two sufficiently divergent genomes cannot be compressed by the reference template method or by graphs. The number of differences becomes so large that it is ludicrous to record the differences between the genomes, as that information can end up being larger than just recording the two full genomes and compressing them independently. This becomes clear if you consider trying to compress a Drosophila and a Saccharomyces genome together, or even very closely related genome sequences such as human and chimpanzee, which have: "… approximately thirty-five million single-nucleotide changes, five million insertion/deletion events, and various chromosomal rearrangements."

At that degree of divergence, it is much easier and almost certainly more space conscious to just separately represent the two genome sequences and compress them independently. For a review of tools in this space, you can see here.


OMSV enables accurate and comprehensive identification of large structural variations from nanochannel-based single-molecule optical maps

We present a new method, OMSV, for accurately and comprehensively identifying structural variations (SVs) from optical maps. OMSV detects both homozygous and heterozygous SVs, SVs of various types and sizes, and SVs with or without creating or destroying restriction sites. We show that OMSV has high sensitivity and specificity, with clear performance gains over the latest method. Applying OMSV to a human cell line, we identified hundreds of SVs >2 kbp, with 68 % of them missed by sequencing-based callers. Independent experimental validation confirmed the high accuracy of these SVs. The OMSV software is available at http://yiplab.cse.cuhk.edu.hk/omsv/.


INTRODUCTION

Genome engineering is not only a powerful research tool, it is also being developed to cure human diseases, including those of the blood and immune system, most of which can be categorized as still having a great unmet medical need 1𠄴 . Ex vivo–engineered nuclease-mediated gene editing by HR in hematopoietic stem and progenitor cells (HSPCs) can shed light on stem cell gene function through precise genetic manipulations, and can potentially define a curative strategy for currently incurable hematological diseases. The RNA-guided Type II CRISPR/Cas9 genome-editing system uses a single protein, Cas9, that is guided by a chimeric single-guide RNA (sgRNA) to target DNA through Watson𠄼rick base-pairing. Because of its simplicity and robustness, it is becoming the most widely used engineered nuclease for editing of mammalian genomes 5,6 . We have previously shown that the high editing performance of a plasmid-based CRISPR/Cas9 system in cell lines did not translate into high editing activity in primary cell types such as human primary T cells and HSPCs 7 . Protection of both sgRNA termini with chemically modified nucleotides increases sgRNA stability and renders the 𠆊ll RNA’-based CRISPR/Cas9 system highly effective in primary HSPCs and T cells 7 . In addition, ribonucleoprotein (RNP) delivery of Cas9 precomplexed with chemically modified sgRNAs consistently increased activity in T cells 7,8 and CD34 + HSPCs (R.O.B., D.P.D., and M.H.P., data not shown). Multiple publications have also shown efficient genome editing in T cells 9 and HSPCs 10,11 without modified sgRNAs in the context of Cas9 RNP delivery. In these studies, however, a direct comparison with synthetic sgRNAs with modifications was not performed, and it remains possible that the generally most active form of an sgRNA, even in the context of RNP delivery, is one that is synthetically manufactured with end modifications protecting against endogenous exonuclease degradation and innate immune stimulation.

Creating a locus-specific double-strand break (DSB) with engineered nucleases forms the foundation of genome editing 12,13 . DSBs can be resolved by one of the two highly conserved competing repair mechanisms, nonhomologous end-joining (NHEJ) and HR 14 . NHEJ repair is the default pathway that functions throughout the cell cycle to repair breaks by ligation of DNA ends without end processing, sometimes resulting in small insertions or deletions (INDELs) at the site of the break. By contrast, HR is normally most active during the S or G2 phase of the cell cycle, when an undamaged sister chromatid is available to serve as an HR repair template. HR can be harnessed for creating precise DNA changes by supplying an exogenous DNA donor template, as long as the donor has homology arms that are identical to the region surrounding the DSB. This way, disease-causing single-nucleotide polymorphisms (SNPs) can be reverted or entire open reading frames can be inserted at specific genomic sites 15 ( Fig. 1 ). Although no studies using CRISPR and rAAV6 donors in CD34 + HSPCs have comprehensively investigated the impact of homology arm size and position relative to the nuclease cut site, studies using transcription activator–like effector nucleases (TALENs) and plasmid donors have found that maximal HR-mediated editing occurs when both the homology arms are at least 400 bp 16 .

Schematic overviews of design strategies for different donor types. (a) A reporter gene expression cassette driving the expression of, for example, a fluorescent protein (FP) can be integrated site-specifically by homologous recombination. 400-bp homology arms (in gray) are split at the CRISPR/Cas9 cut site between nucleotides 17 and 18 of the sgRNA target site (target site depicted in white and PAM in red) and flank the transgene expression cassette (in blue). Upon HR, the cassette is integrated seamlessly into the cut site. (b) SNPs can be introduced (X → Y mutation depicted, blue and green, respectively) using a vector design with 1.2-kb homology arms that flank the region between the desired site of the SNP and the CRISPR/Cas9 cut site. In the donor, the region between the mutation and the cut site should be mutated (the example uses synonymous mutations denoted by asterisks note that encoded amino acids are listed above nucleotides) to avoid early termination of the HR process due to full sequence homology. This also introduces necessary mutations to the PAM and sgRNA target site to prevent Cas9 recutting and INDEL formation after HR. (c) A cDNA sequence (in green, diverged using synonymous mutations, e.g., as depicted in b) can be introduced directly into the start codon (ATG, purple) of a gene to express a desired cDNA from the endogenously regulated expression elements. A separate expression cassette (blue) can be included after the cDNA, encoding, for example, a fluorescent protein (FP), which allows tracking and/or enrichment of targeted cells. The cDNA and reporter cassette are flanked by two 400-bp homology arms (gray) that flank the start codon and the CRISPR/Cas9 cut site. Seamless HR ensures that the cDNA is integrated in frame with the start codon. In an analogous manner, a 2A-cDNA cassette can be integrated immediately before the stop codon to link expression of a transgene to the expression of an endogenous gene. FP, fluorescent protein LHA, left homology arm pA, polyadenylation signal RHA, right homology arm SNP, single-nucleotide polymorphism.

Recombinant adeno-associated viruses (rAAVs) have been shown to naturally mediate high frequencies of HR in mammalian cells without stimulation of DSBs 17� . Wild-type AAVs are non-enveloped single-stranded DNA viruses that consist of an

4.7-kb genome encoding replication (rep) and capsid (cap) genes between 145-bp inverted terminal repeats (ITRs) 19 . Naturally occurring and engineered serotypes exist that have differential tropism, and serotype 6 has been shown to be the most efficient serotype for transduction of HSPCs and primary T cells 20� . AAVs are dependent upon adenoviruses (helper viruses) to replicate in cis however, packaging of rAAV vectors with user-defined DNA cargo is possible by cotransfection of the following three types of plasmids into a host cell line such as HEK293T: (i) transfer plasmids with the homology arms and transgene of interest between the two ITRs, (ii) rep/cap-encoding plasmids (in this protocol rep2/cap6), and (iii) helper plasmids encoding the adenoviral helper proteins (in this protocol Ad5). As rAAVs can generate a high vector copy number in the nucleus and avoid innate immunity, they are ideal templates for HR following use of engineered nuclease-mediated DSBs in primary human cells. Accordingly, several recent reports have efficiently used rAAV6 as donor template for HR in primary human HSPCs 21� .

Human hematopoiesis is the process that generates all blood and immune cells from HSCs with self-renewing capacity 25,26 . Although progress has been made in identifying bona fide repopulating and self-renewing HSCs by immunophenotyping 27,28 , the CD34 + cell surface marker has been readily used to identify a heterogeneous population of HSPCs. This population generally contains

0.1𠄱% HSCs with long-term repopulation capacity (LT-HSCs). This capacity is normally experimentally tested in immunodeficient mice, e.g., in nonobese diabetic (NOD)-severe combined immunodeficiency (SCID)-gamma (NSG) mice, which lack innate and adaptive immunity to allow human cell engraftment in the mouse bone marrow 29 . Several reports have recently shown that HR is more efficient in progenitor cells, as compared with LT-HSCs 10,15,30,31 . Although current investigations attempt to augment HR rates in LT-HSCs, this is the biggest hurdle in accelerating clinical strategies for HR-based genome editing for blood and immune system disorders. We therefore devised a reporter-based enrichment paradigm that takes advantage of a log-fold-higher transgene expression after successful HR into the desired locus (compared with low AAV6 episomal expression), and this methodology could yield strikingly higher frequencies of modified cells in the transplanted mice (up to 97%) 15 . Thus, the enrichment methodology can solve the potential problem of inefficient HSC targeting.

In this protocol, we provide a reproducible methodology for achieving HR in HSPCs using CRISPR/Cas9 and rAAV6 homologous donor delivery. We describe in detail (i) sgRNA selection and AAV6 homologous donor design and construction, (ii) electroporation and transduction protocols, (iii) a flow cytometry�sed strategy to enrich for a population of HPSCs with > 90% targeted integration, and (iv) in vitro and in vivo assays for determining HR frequencies in HSPCs. We also describe the use of the protocol in primary human T cells (Box 1). This is a comprehensive protocol for targeting human HSPCs for HR to investigate hematopoietic gene function and disease modeling, as well as preclinical development of HSC-based cell and gene therapies.

Box 1

Homologous recombination in primary human T cells ● TIMING 6𠄸 h hands-on, 5𠄷 d of culture

We have found that an identical protocol using the same reagents as described below can achieve up to 60% HR frequencies in T cells. Using CRISPR/Cas9 and AAV6, the transgene expression shift upon HR, which allows early enrichment of cells that have undergone HR ( Fig. 2 ), is also apparent in T cells as early as Day 2 after electroporation and transduction.

Enrichment of gene-targeted CD34 + HSPCs using CRISPR/Cas9, AAV6, and FACS methodologies. (Left) Representative CD34 + HSPC FACS plots from day 4 post electroporation of Cas9 RNP and transduction of AAV6 (top) and transduction of AAV6 only (bottom) are shown, highlighting the generation of a reporter high (GFP high , shown in the red gate) population after the addition of Cas9 RNP (see also Supplementary Figure 1 for FACS plots that include staining for CD34 expression). At day 4 post electroporation, targeted HSPCs from GFP high (red), GFP low (green), and GFP neg (blue) fractions were sorted and cultured for 15 d while monitoring GFP expression by flow cytometry every 3 d (right). Note that the reporter high population is > 96% reporter + after 15 d in culture, highly indicative that this population is enriched for stable integration of the reporter cassette. neg, negative SSC, side scatter. Image adapted with permission from ref. 15, Springer Nature.

The above figure shows representative FACS plots from Day 4 post electroporation of T cells with Cas9 RNP or without RNP (Mock) and then transduced with AAV6 vectors carrying an mCherry expression cassette flanked by homology arms for the targeted locus ( Fig. 1a ).

Reagents

Ficoll-Paque PLUS (1.078 g/ml GE Healthcare, cat. no. 17-1440-03)

Pan T Cell Isolation Kit (Miltenyi Biotec, cat. no. 130-096-535)

Anti-human CD3 antibody (BioLegend, cat. no. 317347)

X-VIVO 15 with Gentamicin, L-Glutamine, and Phenol Red (Lonza, cat. no. 04-418Q)

Human serum (Sigma-Aldrich, cat. no. H3667)

Anti-human CD28 antibody (Tonbo Biosciences, cat. no. 70-0289-U100)

IL-2, human (Preprotech, cat. no. 200-02)

IL-7, human (BD, cat. no. 554608)

Dynabeads Human T-Activator CD3/CD28 (Fisher Scientific, cat. no. 11132D)

Reagents

Purify PBMCs from buffy coats using standard Ficoll-based separation.

! CAUTION The use of tissue that is collected from human subjects requires approval by the local institutional review boards.

Isolate CD3 + T cells (Pan T cell isolation) from the PBMCs using the Pan T Cell Isolation Kit.

Directly after T cell isolation, stimulate cells by culturing them for 3 d at 1-million cells per well in a 24-well plate coated with antihuman CD3 antibody (plate precoated for 2 h at 37 ଌ with 300 μl of PBS with 10 μg of purified anti-human CD3 antibody per well) in X-VIVO 15 serum-free medium containing 5% (vol/vol) human serum, 1 μg/ml anti-human CD28 antibody, 100 IU/ml human IL-2, and 10 ng/ml human IL-7. Alternative to the CD3 and CD28 antibodies, human CD3/CD28 Dynabeads can be used at a bead-to-cell ratio of 1:1.

Three days after stimulation, electroporate cells and transduce with AAV6 donor vectors, as described in Steps 3� of the PROCEDURE using T-cell media as described above, but without anti-human CD28 antibody. As for CD34 + HSPCs, we strongly recommend that functional titration of the AAV vector be performed in HR experiments to identify the lowest MOI that yields maximum HR frequencies and high viabilities.

Comparison with other technologies

Just like Cas9, other engineered nucleases, such as zinc-finger nucleases (ZFNs), TALENs, and hybrid meganuclease-TALENs (megaTALs), can stimulate DSBs in mammalian genomes. However, Cas9 has two distinct advantages as compared with these other designer nucleases in stimulating DSBs in HSPCs. First, ZFNs, TALENs, and megaTALs are more cumbersome to construct and require an extensive molecular biology skill set. By contrast, the CRISPR/Cas9 system from Streptococcus pyogenes uses a simple 20-nt guide sequence to facilitate a locus-specific DSB. Furthermore, the chimeric sgRNA can be chemically synthesized with modifications at the ends that make it highly efficient in primary human HSPCs. Second, recombinant Cas9 protein can be easily produced and precomplexed with sgRNAs on the bench and delivered by electroporation as RNP complexes, which shortens nuclease exposure and provides a hit-and-run mechanism that decreases potential unwanted off-target DSBs.

Although rAAV6 can serve as a homologous donor template, there are other donor template platforms that have been shown to mediate efficient HR in HSPCs these include single-stranded oligonucleotides (ssODNs) 10,31 and integration-defective lentiviral (IDLVs) vectors 30 . Although current literature may suggest that rAAV6 is a more efficient donor than IDLV 21 , a thorough comparison has yet to be performed. In comparison with ssODNs, which are usually

50� bp in total size and can therefore mediate only small genomic changes, rAAV6 donor templates can mediate precise SNPs, as well as insert transgene cassettes up to

4 kb in size if a single donor vector is used, or can insert even larger transgene cassettes if a sequential HR strategy with two donor vectors is used 32 . If single-point mutations are the desired genomic change, ssODNs may be a useful donor template for use in HSPCs, as they are easily produced and have been shown to work well in vitro 10 . However, ssODNs are too small to encompass an expression cassette for a reporter gene, and are therefore not compatible with the enrichment protocol for HSCs with precise targeted integration by flow cytometry that we outline in this protocol, which we believe is the key step for eliminating the greater number of nontargeted HSCs that outcompete targeted HSCs for engraftment in the host bone marrow.

Precise genome editing in HSCs has several advantages compared with conventional lentiviral vector (LV)-based gene transfer methods. First, genome editing maintains endogenous regulatory elements, preserving physiologic spatiotemporal regulation of gene expression 1,3 . Second, as LVs integrate semirandomly within the genome, the possibility always remains of insertional mutagenesis near or in oncogenes and/or tumor suppressor genes, confounding experimental results and possibly, although fortunately not yet described, leading to leukemogenesis in therapeutic settings. Third, semirandom lentiviral integration leads to expression heterogeneity among cells, which can be a major confounder in understanding gene-cell function in a heterogeneous population such as HSPCs. For gene therapy, this creates a population of cells with differential potency, potentially requiring higher cell doses or higher vector copy numbers. Fourth, LV-based gene addition does not allow targeted gene knockout by integration of a reporter expression cassette into the gene of interest, which enables tracking and enrichment of knockout cells. These reasons make genome editing the preferred genetic manipulation strategy to elucidate HSC gene function, as well as to correct disease-causing genetic mutations for HSC-based therapies.

Limitations of the protocol

One of the main limitations of the CRISPR/Cas9 system is the need for a protospacer adjacent motif (PAM) sequence within the gene of interest however, the 5′-NGG-𠌣 for SpCas9 can on average be found every 8� bp in the human genome and thus does not usually hinder application 33 . Cas9 variants have been engineered with other PAM specificities that might circumvent this problem, although these engineered variants have not as yet been shown to mediate high levels of editing in primary human cells 34,35 . In addition, appropriate donor design (Supplementary Methods), as described in this protocol, can also usually solve the problem when the CRISPR/Cas9 nuclease site is at a distance from the desired change. A limitation of our enrichment methodology is the need for an exogenous promoter driving the reporter transgene for enrichment (such as GFP or truncated nerve growth factor receptor (tNGFR)) for purifying targeted HSPCs early in the manufacturing process via flow sorting. We do note that the enhanced transgene expression after HR seems to be independent of the chosen promoter, which would allow the use of promoters with varying strengths in HSPCs or inducible promoters. Furthermore, it is possible that active endogenous promoters can be used to drive expression from enrichment cassettes for purifying targeted HSPCs. Nevertheless, the enrichment scheme is fundamental in removing nontargeted HSCs that can outcompete targeted HSCs for repopulation in the host bone marrow.

Experimental design

SgRNA design

Excellent protocols for designing and testing sgRNAs have been described previously 33 . Here, we outline the key steps for identifying and characterizing a highly active locus-specific sgRNA, which is key to achieving optimal HR frequencies in HSPCs (Supplementary Methods). In brief, we routinely screen four to eight sgRNAs (depending on the available PAM sites in the targeted region) in the immortalized K562 cell line. Once a potent candidate sgRNA has been identified, a chemically modified synthetic sgRNA is ordered (Reagents section) from TriLink Biotechnologies (1-μmole minimal synthesis) or Synthego (3-nmole minimal synthesis as new commercial sources for full-length synthetic modified sgRNAs become available, these might be used as well) and is functionally validated in human CD34 + HSPCs. Alternatively, we routinely bypass the functional screen in K562 cells and directly screen a panel of chemically modified synthetic sgRNAs in CD34 + HSPCs. Once high activity (as measured by the frequency of insertions and deletions using tracking of INDELs by decomposition (TIDE) analysis) of an sgRNA in HSPCs has been confirmed, a homologous donor template to introduce a desired genomic change is designed and cloned into an AAV vector plasmid. In general, the CRISPR cut site (between base pairs 17 and 18 of the 20-nt complementary sequence of the sgRNA) should ideally be located as close to the intended genomic change as possible 36 , although with proper design of the donor vector, we have observed substantial activity at a distance of 40 bp from the cut site 15 . In our experience, the farther the homology arms from the break site, the lower the HR efficiencies. When adding transgene expression cassettes into a safe harbor locus, the location of the sgRNA is less restricted, but if adding full cDNA cassettes to be driven by the correct endogenous regulatory context, e.g., directly into the start codon of a gene, the sgRNA target site should be located as close to the intended integration site as possible.

Donor design

The standard HR donor design is two homology arms (symmetric in length) that flank the transgene expression cassette or the mutations that will be introduced ( Fig. 1a Supplementary Methods). Homologous DNA donor design has been well described in excellent protocols referenced here 37,38 . Here, we will describe, in brief, the key concepts to consider when designing an AAV6 homologous DNA donor for CRISPR/Cas9-stimulated HR in HSPCs. Furthermore, details outlining the cloning of HR donors into an AAV plasmid backbone, and the production and purification of AAV vector particles are presented in the Supplementary Methods. The AAV packaging capacity of 4.7 kb limits the size of the donor. The ITRs and homology arms are indispensable elements of the vector, and the ITRs are combined at

270 bp. We recommend homology arms no < 400 bp each, and we have not observed any substantial advantage of extending the length 16 , nor any advantage to using self-complementary AAV6 (double-stranded) over single-stranded AAV6 templates (R.O.B., D.P.D., and M.H.P., data not shown). This leaves

3.6 kb for the insert, so for HR of large transgenes or multicistronic cassettes, careful consideration must be made when designing this. For small genome modifications, such as introduction of SNPs, we recommend 1,200-bp homology arms to keep the total vector size > 2.4 kb, which is greater than half of the AAV packaging capacity and prevents concatemer packaging 39 ( Fig. 1b ). If the HR process is to integrate DNA exactly at the double-strand break, the arms should be split at the CRISPR cut site ( Fig. 1a ). If the desired modification is separate from the CRISPR cut site, the homology arms should flank the region between the cut site and the site of the modification ( Fig. 1b ,c). Two important considerations must be made in this situation: (i) If the target site is not disrupted by the introduced modification, Cas9 might introduce an INDEL after HR has taken place. To avoid this, mutations must be introduced (synonymous, if necessary) in the PAM or the sgRNA target site of the donors 40 ( Fig. 1b ). (ii) If there is sequence homology between the donor and the target DNA in the region between the cut site and the introduced changes, termination of the HR process may occur due to the nascent DNA strand leaving the donor and annealing back to the chromosomal DNA in this region. To avoid this, mutations (synonymous, if necessary) should be introduced to minimize homology in the region between the cut site and the site of the desired change ( Fig. 1b ). Several studies have shown that mutations at codon wobble positions suffice to prevent premature HR termination 15,41 . If a cDNA cassette is to be integrated at the exact position after the start codon, the sgRNA target site should be located as close to the start codon as possible to minimize the gap between the start codon and the cut site, which, if too large, can negatively influence HR rates ( Fig. 1c ).

RAAV vector production

Excellent and comprehensive protocols have already described the production, purification, and titering of recombinant AAVs (rAAVs) 38,42 , and these provide sufficient resources to make rAAV6 (Supplementary Methods). We have detailed the critical steps for rAAV6 vector production and purification in the Supplementary Methods. In brief, rAAV vectors are commonly produced in HEK293T cells by the triple-transfection method, which involves transfection of (i) an ITR-containing transfer plasmid, (ii) a plasmid expressing the adenoviral proteins and RNAs required for helper functions, and (iii) a plasmid expressing the AAV rep and cap proteins that define the serotype. We use a transfer plasmid carrying ITRs from AAV2 (Reagents) and assemble the donor plasmid by standard Gibson assembly of the linear transfer plasmid backbone with terminal ITRs and PCR fragments containing the homology arms amplified from genomic DNA and the insert amplified from a desired expression plasmid. SNP donors can be made in a similar manner using standard site-directed mutagenesis techniques to introduce the SNPs. For AAV production, we use the dual-plasmid transfection system previously described 38 , in which the required adenoviral and AAV genes are combined on a single helper plasmid, pDGM6, which was described previously 43 and can be obtained from the Russell lab at the University of Washington (Reagents). We routinely use rAAV vectors purified from iodixanol gradients for HR experiments in HSPCs, but note that other purification methods work as long as they produce pure rAAV preparations with a low proportion of empty capsids. A high-titer preparation is essential to achieving high transduction rates and avoiding toxicity induced by the dilution of the stem cell culture media when using high volumes of AAV vector. We recommend that the total volume of AAV not exceed 20% of the total culture volume (ideally, < 10%). For titering the AAV preparations, we suggest using quantitative PCR on the ITRs to quantify the number of vector genomes as described previously 44 . It is important to note that the titer obtained is not a functional titer. A wide range of different multiplicities of infection (MOIs, i.e., vector genomes (vg) per cell) have been used in studies performing HR in HSPCs using AAV6. For example, Wang et al. used MOIs of 1� × 10 3 vg/cell 21 , whereas De Ravin et al. used MOIs of 1𠄳 × 10 6 vg/cell 23 . We believe that this large difference is due to differences in titering methodology and/or differences in AAV purity. We suggest that AAV titers be normalized to the AAV2 Reference Standard Material obtainable from the American-Type Culture Collection (ATCC, cat. no. VR-1616) 45 . This will allow more-streamlined titers, but will not address differences in AAV purity. We recommend that functional AAV titration be performed in HR experiments in HSPCs to identify the lowest MOI that yields maximum HR frequencies with minimal cellular toxicity.

Enriching cells that have undergone homologous recombination (Steps 47�)

An intrinsic feature of single-stranded AAV is that expression relies on second-strand synthesis, which subsequently produces dsDNA that is transcriptionally competent. We have recently reported on the observation that episomal AAV6 expression leads to low levels of reporter gene expression, even when driven from constitutive strong viral promoters in primary cells 15 . By contrast, we found that the reporter expression was increased more than a log-fold after HR-mediated chromosomal integration of the same expression cassette ( Fig. 2 Supplementary Fig. 1). We have found that this phenomenon does not depend on cell type, target locus, exogenous promoter, or designer nuclease system used. The reporter shift is apparent as early as 24 h after electroporation and transduction (peaks at 3𠄴 d), and depends on transgene expression kinetics and proliferation status of the cells. We have used this shift in transgene expression to sort cells with reporter high expression and have obtained a purified targeted population with up to 99% purity. Importantly, we have shown in serial transplants that this enriched targeted population contains LT-HSCs.

Colony-forming unit assay and clonal genotyping (Steps 22�)

The colony-forming unit (CFU) assay is a progenitor assay that assesses the potential of progenitor cells to form colonies in semi-solid methylcellulose media. This assay monitors the number of progenitor cells in the targeted population, as well as the proportion of lineage-committed progenitors (myeloid: CFU-GM erythroid: BFU-E and CFU-E) and multilineage progenitors (mixed myeloid and erythroid: CFU-GEMM). It also enables the analyses of clonal genotypes by PCR screening of colonies for targeted integration of a transgene expression cassette. For this, an ‘In–Out’ PCR approach is used, in which one primer is located in the targeted genomic locus outside the region of the homology arm and the other primer is located inside the transgene cassette ( Fig. 3a ). Preferably, two PCRs are performed, identifying both the 5′ and the 3′ junction of the integration. If desired, a third primer can be included for identifying alleles without integration. This primer should be located in the genomic region on the opposite side of the sgRNA target site from the primer outside the homology arm ( Fig. 3 ). It is critical that this primer be located at a distance of at least 50 bp away from the sgRNA cut site, as INDELs may otherwise disrupt the primer-binding site. HR frequencies for integration of SNPs can be assessed by droplet digital PCR or sequencing approaches (such as next-generation sequencing or TOPO cloning), or, if the genomic modifications generate a novel restriction site, restriction fragment length polymorphism (RFLP) analysis can quantify HR rates as described thoroughly before 33 . The same primer design as outlined above should be used with one primer located outside the region of the homology arms.

‘In–Out’ PCR strategy for genotyping on-target integration events in methylcellulose-derived colonies. (a) Schematic outlining the ‘In–Out’ PCR strategy for identifying on-target integration events in HSPCs. (Top) Primer design for the nontargeted allele using one primer that binds outside the left homology arm (LHA), and another primer that binds inside the right homology arm (RHA). Primers are depicted as red arrows. In the example presented, the PCR strategy will produce an 800-bp product (red) from a wild-type (WT)/INDEL allele. Note that the molecular weight of this band could be smaller or larger if an INDEL of substantial size is present. (Bottom) A targeted allele with a reporter cassette after CRISPR/Cas9 and AAV6-mediated homologous recombination. By using the same outside LHA (Out) primer as above, but with a reporter cassette–specific inside primer (In), this ‘In–Out’ PCR strategy will generate a 600-bp on-target integration-specific PCR product (purple). Primers are depicted as purple arrows. (b) A schematic representation of an agarose gel image showing the types of clonal integration events (when targeting an autosomal gene with one allele on each chromosome): WT or INDEL (800 bp, red), biallelic HR (600 bp, purple), and monoallelic HR (800 and 600 bp). Note that the presented PCR strategy is a three-primer PCR that analyzes all events in the same PCR. It is possible to separate the strategy into two PCR reactions. Furthermore, it is recommended to perform the same ‘In–Out’ strategy at the 3′ end of the integration and, importantly, to Sanger-sequence PCR bands to confirm seamless HR at both ends.

Repopulation assay in transplanted immunocompromised mice (Steps 53�)

Importantly, the CFU assay does not analyze the presence of HSCs with self-renewal and multilineage capacity, and for such verification, transplantation into immunocompromised mice is necessary. Although several mouse strains have been used for human hematopoietic repopulation, we recommend using the NSG strain, which is highly supportive of human engraftment and hematopoietic repopulation. If using other strains, more cells may need to be transplanted for robust engraftment. If transplanting bulk, nonenriched cells (RNP electroporation + AAV6 transduction) or RNP-electroporated CD34 + HSPCs without AAV6 donor transduction, transplantation can be performed as early as 2 h to 2 d after electroporation. HSPCs enriched for targeting via a reporter gene are transplanted directly after enrichment, when the frequency of reporter-positive cells peaks (Steps 17�). Different standards for confirming long-term repopulating capacity of stem cells in NSG mice have been reported from 12� weeks in both primary and secondary recipients, respectively. Our recommendation is to assess engraftment 16 weeks after primary transplantation, and optionally perform secondary transplants and analyze these after 12 weeks to confirm true repopulating capacity.

Controls

As a positive control for Cas9/sgRNA activity in CD34 + HSPCs, we refer to the published sgRNA targeting the HBB gene and the matched HBB donor vector encoding GFP 15 . For all targeting experiments, a mock-electroporation control (no Cas9 RNP) is essential. This should be split into two wells: one that receives the AAV donor and one that does not. The latter is used for flow cytometric gating of reporter + cells, and the former is used to set the reporter high gate ( Fig. 2 ). Similarly, for the colony-forming unit assay and engraftment studies in NSG mice, a mock control will serve as a positive control and a potency reference for colony formation/distribution and engraftment.


New Tools for Cost-Effective DNA Synthesis

Nicholas Tang , . Jingdong Tian , in Synthetic Biology , 2013

Gene Assembly

Chemical oligonucleotide synthesis accumulates errors due to side reactions and inefficiencies in the stepwise reactions. 43 Although oligonucleotides of up to 600 bp in length can be synthesized, yields are extremely low. 44 Gene assembly becomes necessary for synthesis of longer constructs. Almost all current assembly techniques use a combination of PCR or ligation-based assembly. The advantage of these methods over restriction digestion/ligation methods is that they can perform scarless and sequence-independent assembly.

Ligation-based assembly uses thermo-stable DNA ligase to join pre-synthesized oligonucleotides. Successful use of ligation-based assembly has been demonstrated in research and commercial synthesis platforms. 45–47 Thermo-stable ligase is advantageous compared to T4 DNA ligase, because fewer DNA secondary structures will form at elevated ligation temperatures. 48–50 Ligation-based assembly involves ligation and PCR amplification. Overlapping oligonucleotides are designed to completely cover both strands of the sequence. They are phosphorylated at the 5′-ends for the ligation reaction. The oligonucleotides are first heat denatured, and then cooled for annealing and ligation at 50–65°C. Because the ligation reaction is linear, the full construct is designed with flanking primer sequences so that PCR can amplify the construct.

PCR-based assembly, otherwise known as polymerase cycling assembly (PCA), remains one of the most cost-effective gene assembly methods. 51,52 In a two-step procedure, partially overlapping oligonucleotides are designed to span the whole sequence. 53 Because gaps between oligonucleotides on the same strand are allowed, the number of starting oligonucleotides is fewer than that required for ligation-based assembly. A PCR reaction is carried out so that overlapping oligonucleotides anneal and extend. The resulting double-stranded construct can serve as the template for the PCR amplification reaction. In a single-step procedure, the amplification primers are included with the oligonucleotides for a combined assembly and amplification reaction. 54,55 Although extra cycles may be needed, this procedure is more easily multiplexed than assembly with multiple steps. 53 Although PCR-based methods are efficient, constructs involving repetitive sequences or secondary structures may have difficulties during PCR, and thus can better be assembled with ligase-based assembly. 56

There are a number of related overlapping extension techniques for sequence-independent assembly. For the sequence-independent assembly of circular double-stranded constructs or plasmids, circular polymerase extension cloning (CPEC) 57 can successfully assemble not only multiple-gene constructs, but also combinatorial sequence libraries. 19 In a single-step reaction, overlapping oligonucleotides are assembled and circularized. Other approaches suitable for plasmid construction include the In-Fusion commercial kit from Clontech, 58 Uracil-specific excision reagent (USER) 59 and Sequence- and Ligation-Independent Cloning (SLIC). 60 Errors may arise during the assembly of large constructs. Gibson isothermal or ‘chewback and anneal’ assembly avoids length-dependent errors with the T5 DNA polymerase, allowing for assembly of genome length constructs, 61 such as a 16.3-kb mitochondrial genome. 62

Rather than cleaving oligonucleotides from microchips, Quan et al. uses an approach involving isothermal nicking and strand displacement amplification (nSDA) of immobilized microarray oligonucleotides. 19 This simultaneously amplifies and releases oligonucleotides, which are then PCA assembled into 1-kb constructs. A microfluidic system serves to integrate synthesis, amplification and assembly ( Fig. 1.3 ). 19,63 By performing all steps on chip, high-throughput assembly can be easily coupled with downstream reactions.

Figure 1.3 . Integrated on-chip DNA amplification and gene assembly. The microchips are divided into physically isolated sub-arrays where oligonucleotides are amplified by isothermo nicking and strand displacement amplification. The released strands are the assembled into 0.5–1 kb gene fragments within the wells.

Although high-fidelity DNA microchips can hold up to a million unique oligonucleotides, they are difficult to scale. Microarray oligonucleotide pools are highly complex, and become problematic in complications like potential cross-hybridization between assembled fragments. More successful scaling will lower the cost of high-quality gene synthesis. To address this issue, Kosuri et al. combined selective oligonucleotide pool amplification, optimized gene assembly, and enzymatic error correction. 12 The authors cleaved microchip-synthesized oligonucleotides. These oligonucleotides were synthesized with flanking amplification primer annealing sites corresponding to several subpools. A quarter-million specific amplification primers were then used to selectively PCR amplify specific subpools of oligonucleotides from the original complex background of oligonucleotides. The flanking amplification primer sequences were then cleaved with restriction enzymes, which allows for seamless subsequent PCR assembly into gene constructs. The authors tested the system on 47 genes encoding for proteins and antibodies, including 40 error-free single-chain antibody genes that had previously been shown to be difficult to synthesize due to high GC content and repetitive sequences. The assembly optimization and enzymatic treatment allowed for accurate and low-cost synthesis, bringing costs down to an estimated USD 0.01/bp.


Genome Editing in Plants: An Overview of Tools and Applications

The emergence of genome manipulation methods promises a real revolution in biotechnology and genetic engineering. Targeted editing of the genomes of living organisms not only permits investigations into the understanding of the fundamental basis of biological systems but also allows addressing a wide range of goals towards improving productivity and quality of crops. This includes the creation of plants with valuable compositional properties and with traits that confer resistance to various biotic and abiotic stresses. During the past few years, several novel genome editing systems have been developed these include zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeats/Cas9 (CRISPR/Cas9). These exciting new methods, briefly reviewed herein, have proved themselves as effective and reliable tools for the genetic improvement of plants.

1. Introduction

Since the advent of recombinant DNA technology in Paul Berg’s laboratory [1] in 1972, genetic engineering has come a long way and achieved enormous success. Many molecular and genetic mechanisms and phenomena have been discovered and studied in detail and the knowledge accumulated now permits researchers to reproduce experiments in vitro. Several decades-long investigations in molecular genetics and biochemistry of bacteria and viruses have allowed researchers to develop new methods of manipulating DNA through creation of various vector systems and tools for their delivery into the cell. All of these developments allow successful creation of not only transgenic microorganisms but also genetically modified higher organisms including various plant and crop species. Creation of novel tools for breeding and biotechnology, an application area of genetic engineering, has received significant focus resulting in accelerated development of useful tools. However, conventional genetic engineering strategy has several issues and limitations, one of which is the complexity associated with the manipulation of large genomes of higher plants [2].

Currently, several tools that help to solve the problems of precise genome editing of plants are at scientists’ disposal. In 1996, for the first time, it was shown that protein domains such as “zinc fingers” coupled with FokI endonuclease domains act as site-specific nucleases (zinc finger nucleases (ZFNs)), which cleave the DNA in vitro in strictly defined regions [3]. Such a chimeric protein has a modular structure, because each of the “zinc finger” domains recognizes one triplet of nucleotides. This method became the basis for the editing of cultured cells, including model and nonmodel plants [4, 5].

Continued efforts and investigations led to the development of new genome editing tools such as TALENs (transcription activator-like effector nucleases) and CRISPR/Cas (clustered regularly interspaced short palindromic repeats). Designing TALENs requires reengineering of a new protein for each of the targets. However, the design process has been streamlined recently by making the modules of repeat combinations available that essentially reduces the cloning required for the design. On the other hand, designing and use of CRISPR are simple. Both TALEN and CRISPR systems have been shown to work in human cells, animals, and plants. Such editing systems when used for efficient manipulation of genomes could solve complex problems including the creation of mutant and transgenic plants [12, 41]. Moreover, chimeric proteins containing zinc finger domains and activation domains of other proteins and those based on the TALE DNA-binding domain and Cas9 nuclease were used in experiments for regulation of gene transcription, study of epigenomes, and the behavior of chromosome loci in cell cycle [24, 42–44].

In this review, we briefly described the mechanisms of different genome editing systems and their use for crop improvement and also highlighted the multiple advantages and applications of engineered nucleases as well as biosafety and regulatory aspects of plants generated using engineered nuclease based technologies.

2. Mechanisms of Genome Editing Systems

Novel genome editing tools, also referred to as genome editing with engineered nuclease (GEEN) technologies, allow cleavage and rejoining of DNA molecules in specified sites to successfully modify the hereditary material of cells. To this end, special enzymes such as restriction endonucleases and ligase can be used for cleaving and rejoining of DNA molecules in small genomes like bacterial and viral genomes. However, using restriction endonucleases and ligases, it is extremely difficult to manipulate large and complex genomes of higher organisms, including plant genomes. The problem is that the restriction endonucleases can only “target” relatively short DNA sequences. While such specificity is enough for short DNA viruses and bacteria, it is not sufficient to work with large plant genomes. The first efforts to create methods for the editing of complex genomes were associated with the designing of “artificial enzymes” as oligonucleotides (short nucleotide sequences) that could selectively bind to specific sequences in the structure of the target DNA and have chemical groups capable of cleaving DNA [45].

Targeted approach to address this challenge was the design of chimeric nucleases which are complex proteins containing one or two structural units, one of which catalyzes the cleavage of DNA, and the second is capable of selectively binding to specific nucleotide sequences of target molecule, providing the nuclease action to this site (Table 1) [46, 47]. These chimeric nucleases can be “produced” directly in the cell: to this end, appropriately engineered vectors encoding nucleases need to be introduced into cell. Such vectors are also supplied with nuclear localization signal which enables the nuclease to enter the cell nucleus thereby getting access to genomic DNA.

2.1. Zinc Finger Nucleases (ZFNs)

ZFNs were the first generation of genome editing tools that use chimerically engineered nucleases which were developed after the discovery of the working principles of the functional Cys2-His2 zinc finger (ZF) domain [3, 4, 46, 48]. Each Cys2-His2 ZF domain consists of 30 amino acid residues, which are folded up to ββα configuration [48–50]. Crystallographic structure analysis showed that the Cys2-His2 ZF proteins bind to DNA by inserting an α-helix of the protein into the major groove of the DNA-double helix [51]. Each ZF protein has the ability to recognize 3 tandem nucleotides in the DNA. Generalized ZFN monomer consists of two different functional domains: artificial ZF Cys2-His2 domain at the N-terminal region and a nonspecific FokI DNA cleavage domain at the C-terminal region. FokI domain dimerization is critical for ZFN enzymatic activity [3]. The observation that the modular recognition of zinc finger domains presents as a series to the corresponding, consecutive three bp targets enabled the realization that each of the individual zinc finger domains could be interchangeable and that the manipulation of the order of the domains would lead to unique binding specificities to the proteins harboring them thereby enabling targeting of specific, unique sequences in the genome. For example, a ZFN dimer, consisting of two 3 or 4 ZF domains, recognizes a target sequence of 18 or 24 base pairs, which statistically form unique sites in the genomes of most organisms (Table 1).

The design and application of ZFNs involve modular design, assembly, and optimization of zinc fingers against specific target DNA sequences followed by linking of individual ZFs towards targeting larger sequences. Over the years, zinc finger domains have been generated to recognize a large number of triplet nucleotides. This enabled the selection and linking of zinc fingers in a sequence that would permit recognition of the target sequence of interest.

Since the first report on zinc fingers in 1996, they have been successfully used in several organisms including plants [4]. Examples include targeted inactivation of endogenous genes in Arabidopsis [15, 16], high frequency modification of tobacco genes [17], and precise targeted addition of a herbicide-tolerance gene as well as insertional disruption of a target locus in maize [18]. ZFNs have also been used for trait stacking in maize [52, 53].

Zinc finger nucleases have revolutionized the field of genome editing by demonstrating the ability to manipulate genomic sites of interest and opened the gates for both basic and applied research. ZFNs provide advantages over other tools with respect to efficiency, high specificity, and minimal nontarget effects and current efforts are focused on further improving design and delivery as well as expanding their applications in diverse crops of interest.

2.2. Transcription Activator-Like Effector Nucleases (TALENs)

The quest for efficient and selective manipulation of target genomic DNA led to the identification of unique transcription activator-like effector (TALE) proteins that recognize and activate specific plant promoters through a set of tandem repeats that formed the basis for the creation of a new genome editing system consisting of chimeric nucleases called TALE nucleases (TALENs) [47]. TALE proteins consist of a central domain responsible for DNA binding, nuclear localization signal, and a domain that serves as activator of transcription of the target gene (Table 1) [54]. For the first time, the DNA-binding ability of these proteins was described in 2007 [55], and a year later, two scientific groups have decoded the recognition code of target DNA sequence by TALE proteins [56].

It is shown that the DNA-binding domain in TALE monomers in turn consists of a central repeat domain (CRD) that confers DNA binding and host specificity. The CRD consists of tandem repeats of 34 amino acid residues and each 34-amino acid long repeat in the CRD binds to one nucleotide in the target nucleotide sequence. Two of the amino acids of the repeat, located at positions 12 and 13, are highly variable (repeat variable diresidue (RVD)) and are responsible for the recognition of specific nucleotide with degeneracy of binding several nucleotides with differential efficiency. The last tandem repeat binding to nucleotide at the 3′-end of the recognition site consists of 20 amino acid residues only and, therefore, it is named as half-repeat. While TALE proteins, in general, can be designed to bind any DNA sequence of interest, studies have demonstrated that the 5′-most nucleotide base of the DNA sequence bound by a TALE protein should always be a Thymidine and that a deviation from this requirement can affect the efficacy of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R), and TALENs [57].

After the DNA code recognition requirements by TALE proteins have been cracked, the very first effort undertaken was the creation of chimeric TALEN nucleases [5]. For this purpose, the sequence encoding the DNA-binding TALE domain was inserted into a plasmid vector previously used to create ZFN [58]. This resulted in the creation of a synthetic, chimeric sequence-specific nuclease genetic construct containing the DNA-binding domain of TALEs and the catalytic domain of FokI restriction endonuclease. This construct helped to create artificial nucleases with DNA-binding domain and different RVDs that can target any nucleotide sequence of interest [2, 4].

In most studies, the monomers with RVDs Asn and Ile (NI), Asn and Gly (NG), two Asn (NN), and His and Asp (HD) bind to nucleotides A, T, G, and C, respectively. NN, the most common RVD that specifies G, was also found to bind to A. This suboptimal or lack of specificity is a concern for the use of engineered TALEs for targeting DNA. Another RVD NK has less functional efficiency compared to NN, although it has demonstrated guanine specificity. Several studies have also shown that the use of NH or NK RVDs for specific binding of guanine reduces the risk of nontarget effects [19, 59, 60]. It has been shown that in RVD (NI, NG, NN, or HD) the first amino acid residue, whether it is N or H, is responsible for the stabilization of spatial conformation although it does not directly bind to a nucleotide, whereas the second amino acid residue binds to a nucleotide either through hydrogen bonding with nitrogenous bases (in case of D and N amino acids) or through van der Waals forces (in case of I and G) [61].

Based on the mode of action and specificity of TALENs, it should be possible to introduce double strand breaks in any location of the genome as long as that location harbors the recognition sequence corresponding to the DNA-binding domains of TALENs. There is another condition that also needs to be met, that is, the requirement of the presence of Thymidine before the 5′ end of the intended target sequence since it has been demonstrated that the W232 residue in the N-terminal portion of the DNA-binding domain interacts with the Thymidine and influences the binding efficiency [62]. It is also possible to overcome this 5′ Thymidine constraint by developing mutant variants of TALEN N-terminal domain which can bind other nucleotides [57]. Considering the ease of site-directed manipulation using TALEN system, within a short period of time after the unraveling of the TALEN mode of action, the genes modified by this system have been used successfully in several animal and plant species and the plant examples include rice, wheat, Arabidopsis, potato, and tomato (Table 2) [63].

2.3. Oligonucleotide-Directed Mutagenesis (ODM)

After first successful exploitation in mammalian systems, oligonucleotide-directed mutagenesis (ODM) has become another novel gene editing tool for plants [7, 64]. ODM, a tool for targeted mutagenesis, uses a specific 20- to 100-base long oligonucleotide, the sequence of which is identical to the target sequence in the genome except that it contains a single base pair change (intended mutation to be inserted in the genome) towards achieving site-directed editing of gene/sequence of interest (Table 1) [65]. When these synthetic oligonucleotides or repair templates with homology to a specific region of the target gene are transiently exposed to the plant cells by using a variety of specific delivery methods, they bind to the targets and activate cell’s natural repair machinery which recognizes the single mismatch in the template and then copies that mismatch or mutation into the target sequence through repair process [7, 65]. This produces the desired targeted single nucleotide or base editing in the plant genome that confers novel function or trait while the plant cell degrades the repair template oligonucleotide. Using tissue culture methods, cells with edited sequences are subsequently regenerated and genome edited novel varieties with improved traits/characteristics are developed through traditional breeding (Table 2) [7, 64, 65].

2.4. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)

Another novel genome editing system that has emerged recently and has become widely popular is the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR associated (Cas) protein system with the most prominent being the CRISPR/Cas9 (based on Cas9 protein). This is a method that utilizes adaptive bacterial and archaeal immune system, the mechanism of which relies on the presence of special sites in the bacterial genome called CRISPR loci. These loci are composed of operons encoding the Cas9 protein and a repeated array of repeat spacer sequences. The spacers in the repeat array are short fragments that are derived from foreign DNA (viral or plasmid) that have become integrated into bacterial genome following recombination [41, 66].

Unlike the chimeric TALEN proteins, target site recognition by CRISPR/Cas9 system is accomplished by the complementary sequence based interaction between the guide (noncoding) RNA and DNA of the target site and the guide RNA and Cas protein complex has the nuclease activity for exact cleavage of double-stranded DNA using Cas9 endonuclease (Table 1) [9, 24, 67].

Several types of CRISPR protective systems functioning in cells of various bacteria are described in detail elsewhere [68, 69]. The most “popular” system is the CRISPR/Cas type II-A system found in the bacterium Streptococcus pyogenes and composed of three genes encoding CRISPR RNA (crRNA), trans-activating crRNA (tracrRNA), and Cas9 protein. Based on this system, universal genetic constructs encoding artificial elements of CRISPR/Cas “genome editor” have been created [70]. Also, a simplified version of the system, functioning as a complex of Cas9 protein and a single guide RNA, consisting of CRISPR tracrRNA and short, mature crRNA was created. The guide sequence identifies the target DNA site and binds to it based on complementarity and Cas9 cleaves the DNA in target point [71].

CRISPR system can be used for the creation of genetically modified cells grown in culture and living organisms [11]. In the first case, plasmids or viral vectors which provide high and stable synthesis of CRISPR/Cas9 system elements are introduced into cells. In the second case, cultured protoplasts and a plasmid coding CRISPR/Cas elements are used to obtain genetically modified plants [32]. Another approach, applied for plants, is the use of Agrobacterium, the natural “genetic engineer,” that contains a special plasmid harboring CRISPR/Cas9 system [41, 44].

Thus, due to its simplicity, efficiency, and wide capabilities, in a short time CRISPR/Cas9 system has already found use in various fields of fundamental and applied biology, biotechnology, and genetic engineering.

2.5. Repair of Cleaved Genomic Sites

An important step in the genome editing process is the repair of the DNA break created by the nucleases. DNA break gets repaired by the endogenous cellular mechanisms: nonhomologous end-joining (NHEJ) or homology-dependent (or directed) repair (HDR) [14]. NHEJ is the simplest mechanism where the ends of the cleaved DNA are joined together, often resulting in the insertion or deletion of nucleotides (indels) thereby shifting the gene reading frame, resulting in a gene “knockout” [72]. If indels are not observed, the DNA is recovered, and there are no noticeable changes. On the other hand, HDR is a mechanism where a sequence containing homology to target is used as a template for repairing the break or the DNA lesion. Therefore, by providing a template that contains a desired sequence of interest flanked by sequences homologous to both sides of the break point, one can force the insertion of that desired sequence into the target site. When HDR occurs, a homologous recombination is used to enable new sequences for gene recovery or insertion [72]. This method is simple, provides the exact impact on DNA target, and can be used at almost any modern molecular biology laboratory.

3. Practical Applications of Genome Editing Systems

3.1. Application of “Genome Editors” for Functional Genomics

Several different types of genome modifications can be achieved by utilizing ZFN, TALEN, ODM, and CRISPR/Cas genome editing systems (Table 2). These include creation of point mutations, insertion of new genes in specific locations or deletion of large regions of the nucleotide sequences, and correction or substitution of individual genetic elements and gene fragments [4, 6, 10, 20, 23, 44, 73].

While introducing modifications to various genomic elements in plant cells and examining the results, scientists were able to investigate the role of individual genes in the functioning of individual cells and the organism as a whole. For example, the unique ability of CRISPR/Cas9 system to selectively bind to specific DNA sites has helped to regulate gene activity [24, 41, 44]. For this purpose, proteins activating or repressing the activity of promoters that control the gene function can be attached to the catalytically inactive mutant Cas9 protein. In one example, it was shown that complex binding to the target DNA can inhibit or stimulate the function of the target gene [44].

Furthermore, using CRISPR/Cas9 system, several genetic constructs targeted to different genome sites can simultaneously be introduced into cells [8, 24, 43]. This is a welcome feature in investigating intergenic interaction, if any, because several genes are simultaneously affected by the CRISPR/Cas9 system [44]. For example, using this approach, it was possible to identify genes involved in crop domestication process [74].

3.2. Application of Genome Editing Systems in Crop Improvement

Genome editing technologies have wide practical applications for solving one of the most important tasks of modern biotechnology—the creation of new varieties of crops, which are high-yielding and resistant to abiotic and biotic stresses and also have high nutritional value (Table 2) [31, 63, 75–80]. To this end, genome editing system has been used in plant breeding (1) to insert point mutations similar to natural SNPs [26, 27], (2) to make small modifications to gene function [13], (3) for integration of foreign genes, (4) for gene pyramiding and knockout, and (5) for the repression or activation of gene expression, as well as (6) epigenetic editing [6].

For example, the use of ZFN in Arabidopsis thaliana [15–17] and Zea mays [18] has led to the successful development of herbicide tolerant genotypes through insertion of herbicide-resistance genes into targeted sites in the genome [18]. ZFN was also used for the targeted modification of an endogenous malate dehydrogenase (MDH) gene in plants and the plants containing modified MDH have shown increased yield [81]. ODM technique has been significantly advanced through Cibus Rapid Trait Development System (RTDS) [7] and this technology has been successfully applied in several crops. Applications include but are not limited to the generation of herbicide tolerance, insect resistance, enhanced disease resistance (bacterial and viral), improved nutritional value, and enhanced yield without the introduction of foreign genes as has been used in traditional genetic engineering approach for crop development [7, 65]. A precise editing of CAC to TAC using ODM RTDS technology has been demonstrated that converts BEP to GFP by changing Histidine (H66) to Tyrosine (Y66) in GFP protein. This approach has offered a nontransgenic breeding tool for crops [7, 64].

Using the CRISPR/Cas9 technology, Jiang et al. [28] have obtained “a biotech” oil from Camelina sativa seeds with an improved fatty acid composition, which makes it more beneficial to human health, more resistant to oxidation, and more appropriate for the production of certain commercial chemicals including biofuels [28]. Soyk et al. [29] used targeted mutagenesis of SP5G gene of tomato to create plants with rapid flowering and more compact bush, which in turn resulted in earlier harvest. In another effort, Osakabe et al. [31], using the CRISPR-induced mutagenesis of OST2 gene in Arabidopsis, were able to obtain new alleles that confer salt stress resistance to plants [31].

Modulation of the gibberellin biosynthesis by genome editing methods has allowed creation of dwarf fruit trees [30], which have great potential for increasing productivity through higher density plantings and reduced labor costs. This results in a reduction of land, water, pesticide, and fertilizer use [82]. In addition, genome editing for inhibition of ethylene biosynthesis, which plays a very important role in fruit ripening process [82] or its signaling pathways, enables creation of new varieties with extended shelf life [63].

A major area of application of genome editing approaches in plant breeding is to create varieties resistant to various pathogens and/or pests. These methods have been used for the modification of the key plant immunity stages at different levels in several crops. This goal can be achieved by modifying (1) susceptibility genes (S-genes), (2) resistance genes (R-genes), (3) genes regulating the interaction between the effector and target, and (4) the genes regulating plant hormonal balance [78]. For example, wheat genotypes resistant to powdery mildew disease were obtained by TALEN- and CRISPR/Cas9-mediated genome editing on mildew-resistance locus O (MLO) [34]. Genome editing technologies have also been used to produce plants resistant to bacterial leaf blight, caused by Xanthomonas oryzae pv. oryzae [21].

The CRISPR/Cas9 system has been investigated for its efficacy in providing interference against geminiviruses by using a transient transformation system such that N. benthamiana degradation/suppression of curly top virus genome by single guide RNA/Cas9 (sgRNA/Cas9) has been demonstrated [35]. In other efforts, where sgRNAs specific for tomato yellow leaf curl virus (TYLCV) or bean yellow dwarf virus (BeYDV) sequences were introduced into N. benthamiana plants expressing Cas9 endonuclease and challenged with the corresponding viruses, it was demonstrated that the CRISPR/Cas9 system not only targeted viruses for degradation but also introduced mutations at the target sequences [36, 37] due to interference with the copy number of freely replicating viruses [78].

Metabolic pathways that regulate hormonal balance can also be modified using the genome editing technologies to enhance the immunomodulatory component of the plants immune system. This can be achieved by deactivating the ethylene-responsive factor (ERF). In particular, ethylene-dependent pathway in rice has been successfully modified by CRISPR/Cas9-mediated target OsERF922 gene mutations, resulting in increased resistance to Magnaporthe oryzae [38, 39].

CRISPR/Cas9 has been used to knock out eIF4E gene that encodes the eukaryotic translation initiation factor essential for translation of viruses, in Cucumis sativus, and that knockout confers resistance to viruses such as cucumber vein yellowing virus (CVYV), zucchini yellow mosaic virus (ZYMV), and papaya ring spot mosaic virus-W (PRSV-W) [83]. In addition, CRISPR/Cas9 was demonstrated to be an efficient system for rapid and efficient genome editing in Phytophthora sojae, an oomycete pathogen of Soybean, by modifying the pathogenicity gene (Avr4/6), thereby opening up an avenue for the much needed functional genomics work in Phytophthora sojae towards the ultimate goal of controlling this pathogen [83].

Similarly, existing genome editing methods, in particular, CRISPR/Cas9 method, have been successfully used to obtain plants resistant to herbicides [33]. For example, editing of ALS2 gene in maize (acetolactate synthase or ALS is a key enzyme in the biosynthesis of amino acids in plants and has been inhibited by sulfonylurea herbicides) allowed the creation of a mutant corn plant resistant to chlorsulfuron [33].

Another interesting area of biotechnology where CRISPR/Cas9 system has significant application is the development of plants capable of synthesizing human proteins such as insulin, necessary for patients with diabetes mellitus, or albumin, which is used in the treatment of hemorrhagic shock, burns, hypoproteinemia, and cirrhosis [84]. At present, albumin is prepared from human plasma which is in a very limited supply however, global demand for albumin is constantly growing and currently is equal to 500 tons per year. To meet the growing needs human albumin gene is already introduced into rice genome using genomic engineering techniques [85]. Such expressed proteins can be isolated from plant and animal tissues, where it is synthesized, and after clarification, it can be used for medical purposes.

Thus, as described above and extensively referenced herein, these novel genome editing techniques are being widely used for the purpose of crop improvement including new bioenergy crop developments [86]. However, the use of tissue culture with these GEEN methods may also create complexities that could slow the process of genome editing.

4. Safety Assessment Aspects of Genome Editing Systems

4.1. Nontarget Effects

Genome editing techniques, in essence, preserve the native genomic structure and, therefore, are considered as a safe technology for crop improvement. Despite this general understanding, there are some concerns related to the biosafety of crops created using these methods. One main concern in terms of its biosafety is the possibility of nontarget effects of synthetic nucleases during genome editing.

During the biotechnological application of genome editing methods, efficiency and specificity of the engineered nucleases are the two most important functional requirements and are closely related to the choice of the target site. For each endogenous genomic locus, efficiency of DNA cleavage (both target and nontarget) depends not only on the nuclease activity (such as FokI domains and Ruv domains of the Cas9 proteins), but also on the availability of a target site and affinity of the DNA-binding domain (e.g., TAL effector domains and guide RNA, gRNA) to the target sequence. Specificity of engineered nucleases largely depends on the binding affinity of nuclease-DNA, including the binding of zinc finger to DNA (ZFNs), TAL effector to DNA (TALENs), and hybridization of gRNA with DNA (CRISPR), although dimerization of FokI domain (ZFNs and TALENs) and Cas9 interaction with the motif contiguous to protospacer adjacent motif (PAM) may also play an important role [87]. In case of ZFNs, while examples abound with respect to the binding efficiency of canonical C2H2 binding domain containing ZFNs, investigations on the utility of noncanonical ZFNs such as those containing C3H1 binding domain have demonstrated high levels of binding efficiency [88].

To minimize nontarget effects of genome editing systems, a crucial aspect is the careful selection of sites for the introduction of the double-stranded breaks by performing a prior bioinformatics analysis [89]. When choosing the desired sites, sites of repeated sequences and sites having a high homology with other regions of the genome should be avoided. In this regard, to facilitate the selection of the target sites for nucleases and experimental verification of the presence of nontarget effects, several software packages were developed that enable nuclease design and validation [79, 87, 90].

4.2. Regulation of Plants Created by Genome Editing

The novel genome editing systems help to introduce stably inherited point modifications into the plant genome, and transgenic region can be easily removed after editing a target gene. This allows creation of nontransgenic plants and improved crop varieties [22, 91–93]. These technologies are faster compared to traditional breeding methods and help to obtain the null segregant lines that have lost the transgene insertion [94–97]. Plants with targeted mutations developed by genome editing technology are nearly identical to plants obtained by classical breeding, and their safety must be assessed taking into account the resulting product rather than the process used to create them [98–100]. In this context, ODM-derived products are in many cases indistinguishable from conventionally bred or traditional mutagenesis products therefore, such products should not be regulated in the same way as the products generated by genetic engineering methods [7, 65]. Using CRISPR-Cas9 system, it becomes possible to obtain marker-free genetically engineered crops, that is, without marker genes of antibiotic resistance [6, 100]. Thus, in the case of new varieties with targeted mutations, developed using genome editing systems, the existing operating rules for the regulation of genetically modified plants should not be applied [92, 95, 99, 100]. Currently genome editing technologies are being discussed by various advisory and regulatory authorities in the context of GMO legislation. Cultures and plants obtained using genome editing techniques are considered as nongenetically modified [95, 99, 101]. The European Commission is expected to publish a report on regulatory uncertainty of genome editing methods [100, 102, 103].

5. Multitude of Advantages and Perspectives of GEENs

Tools of genome editing have a significant impact on basic and applied research in plant biology [24, 43, 44, 73]. The simplified approach to gene/genome editing represents a valuable tool for plant researchers in functional analysis of gene(s) and for breeders in the integration of key genes in the genomes of agriculturally important crops. Genome editing systems have several attractive features including simplicity, efficiency, high specificity, minimal nontarget effects, and amenability to multiplexing and thus are very promising for use in plant breeding [6].

Site-directed mutagenesis of different genes can provide important information about their functions. Simultaneous targeting of multiple genes/loci by applying multiplex strategies can promote research to identify the role of individual genes in the intracellular signaling pathways and aid in the engineering of complex, multigenic agronomic traits in crops. The preferred use of CRISPR-Cas9 system can be exemplified in completely knockout gene function [6, 64], microRNA knockdown screening [6], and programmed editing of certain loci by genome editing systems that can provide a functional separation of cis- and trans-regulatory elements/factors with high accuracy [6]. Another prospective application of CRISPR-Cas9 system may be its use in the formation of conditional alleles, providing spatial and temporal control of gene expression to study the function of lethal genes. Use of inducible or tissue-specific promoters for expression of Cas9 and/or sgRNA can be instrumental for gene expression regulation in a specific tissue, in development stage, or in different environmental conditions [6].

CRISPR-Cas system opens up wide possibilities for labeling endogenous genes with fluorescent proteins to visualize their expression in vivo. Using fluorescent labeled dCas9, changes of genome dynamics/chromosome architectural changes during plant development and their response to environmental stimuli can be learned. These technologies can also be used for the selection of the specific cell types that greatly facilitate the study of various functional aspects [6]. Use of dCas9 can provide a new platform for the selection of activation/repression effector domains to specific genomic loci for regulating endogenous gene expression.

In addition, these technologies can be successfully used in the work on epigenome editing via the selection of proteins responsible for histone modification and DNA methylation, which has emerged as a new way of regulating cellular functions in plants [25]. For the purpose of understanding epigenetic regulation, CRISPR-Cas9 system can also be used for the enrichment of chromatin target sites for the identification of proteins attached to enriched chromatin. Likewise, CRISPR-Cas9 can be used as a tool to identify regulatory proteins binding to specific DNA sequences controlling the expression of genes.

6. Conclusion

Genome editing tools are becoming popular molecular tools of choice for functional genomics as well as crop improvement. Many examples exist currently where these editing systems are being harnessed for unprecedented understanding of plant biology and crop yield improvement through rapid and targeted mutagenesis and associated breeding [102, 104]. Because of their several attractive features such as simplicity, efficiency, high specificity, and amenability to multiplexing, genome editing technologies described here are revolutionizing the way crop breeding is done and paving the way for the next generation breeding.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

The authors thank Academy of Sciences of Uzbekistan and Science and Technology Agency of Uzbekistan for Research Grants nos. FA-F5-021 and FA-F5-025.

References

  1. M. F. Singer, “Introduction and historical background,” in Genetic Engineering, J. K. Setlow and A. Hollaender, Eds., vol. 1, pp. 1–13, Plenum, New York, NY, USA, 1979. View at: Google Scholar
  2. A. A. Nemudryi, K. R. Valetdinova, S. P. Medvedev, and S. M. Zakian, “TALEN and CRISPR/Cas genome editing systems: tools of discovery,” Acta Naturae, vol. 6, no. 22, pp. 19–40, 2014. View at: Google Scholar
  3. Y.-G. Kim, J. Cha, and S. Chandrasegaran, “Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain,” Proceedings of the National Academy of Sciences of the United States of America, vol. 93, no. 3, pp. 1156–1160, 1996. View at: Publisher Site | Google Scholar
  4. T. Gaj, C. A. Gersbach, and C. F. Barbas III, “ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering,” Trends in Biotechnology, vol. 31, no. 7, pp. 397–405, 2013. View at: Publisher Site | Google Scholar
  5. D. P. Weeks, M. H. Spalding, and B. Yang, “Use of designer nucleases for targeted gene and genome editing in plants,” Plant Biotechnology Journal, vol. 14, no. 2, pp. 483–495, 2016. View at: Publisher Site | Google Scholar
  6. V. Kumar and M. Jain, “The CRISPR-Cas system for plant genome editing: Advances and opportunities,” Journal of Experimental Botany, vol. 66, no. 1, pp. 47–57, 2015. View at: Publisher Site | Google Scholar
  7. N. J. Sauer, J. Mozoruk, R. B. Miller et al., “Oligonucleotide-directed mutagenesis for precision gene editing,” Plant Biotechnology Journal, vol. 14, no. 2, pp. 496–502, 2016. View at: Publisher Site | Google Scholar
  8. J. F. Li, J. E. Norville, and J. Aach, “Multiplex and homologous recombination-mediated genome editing in Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9,” Nature Biotechnolog, vol. 31, no. 8, pp. 688–691, 2013. View at: Publisher Site | Google Scholar
  9. M. Jinek, K. Chylinski, I. Fonfara, M. Hauer, J. A. Doudna, and E. Charpentier, “A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity,” Science, vol. 337, no. 6096, pp. 816–821, 2012. View at: Publisher Site | Google Scholar
  10. B. Chen, J. Hu, R. Almeida et al., “Expanding the CRISPR imaging toolset with Staphylococcus aureus Cas9 for simultaneous imaging of multiple genomic loci,” Nucleic Acids Research, vol. 44, no. 8, p. e75, 2016. View at: Publisher Site | Google Scholar
  11. S. W. Cho, S. Kim, J. M. Kim, and J.-S. Kim, “Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease,” Nature Biotechnology, vol. 31, no. 3, pp. 230–232, 2013. View at: Publisher Site | Google Scholar
  12. A. Noman, M. Aqeel, and S. He, “CRISPR-Cas9: Tool for qualitative and quantitative plant genome editing,” Frontiers in Plant Science, vol. 7, no. 2016, article no. 1740, 2016. View at: Publisher Site | Google Scholar
  13. Y. Mao, H. Zhang, N. Xu, B. Zhang, F. Gou, and J.-K. Zhu, “Application of the CRISPR-Cas system for efficient genome engineering in plants,” Molecular Plant, vol. 6, no. 6, pp. 2008–2011, 2013. View at: Publisher Site | Google Scholar
  14. P. D. Hsu, D. A. Scott, J. A. Weinstein et al., “DNA targeting specificity of RNA-guided Cas9 nucleases,” Nature Biotechnol, vol. 31, pp. 827–832, 2013. View at: Publisher Site | Google Scholar
  15. K. Osakabe, Y. Osakabe, and S. Toki, “Site-directed mutagenesis in Arabidopsis using custom-designed zinc finger nucleases,” Proceedings of the National Academy of Sciences of the United States of America, vol. 107, no. 26, pp. 12034–12039, 2010. View at: Publisher Site | Google Scholar
  16. F. Zhang, M. L. Maeder, E. Unger-Wallaced et al., “High frequency targeted mutagenesis in Arabidopsis thaliana using zinc finger nucleases,” Proceedings of the National Academy of Sciences of the United States of America, vol. 107, no. 26, pp. 12028–12033, 2010. View at: Publisher Site | Google Scholar
  17. J. A. Townsend, D. A. Wright, R. J. Winfrey et al., “High-frequency modification of plant genes using engineered zinc-finger nucleases,” Nature, vol. 459, no. 7245, pp. 442–445, 2009. View at: Publisher Site | Google Scholar
  18. V. K. Shukla, Y. Doyon, J. C. Miller et al., “Precise genome modification in the crop species Zea mays using zinc-finger nucleases,” Nature, vol. 459, pp. 437–441, 2009. View at: Publisher Site | Google Scholar
  19. M. Christian, Y. Qi, Y. Zhang, and D. F. Voytas, “Targeted Mutagenesis of Arabidopsis thaliana Using Engineered TAL Effector Nucleases,” G3: Genes, Genomes, Genetics, vol. 3, no. 9, pp. 1697–1705, 2013. View at: Publisher Site | Google Scholar
  20. H. Zhang, F. Gou, J. Zhang et al., “TALEN-mediated targeted mutagenesis produces a large variety of heritable mutations in rice,” Plant Biotechnology Journal, vol. 14, no. 1, pp. 186–194, 2016. View at: Publisher Site | Google Scholar
  21. T. Li, B. Liu, M. H. Spalding, D. P. Weeks, and B. Yang, “High-efficiency TALEN-based gene editing produces disease-resistant rice,” Nature Biotechnology, vol. 30, no. 5, pp. 390–392, 2012. View at: Publisher Site | Google Scholar
  22. M. M. Mahfouz, L. Li, M. Piatek et al., “Targeted transcriptional repression using a chimeric TALE-SRDX repressor protein,” Plant Molecular Biology, vol. 78, no. 3, pp. 311–321, 2012. View at: Publisher Site | Google Scholar
  23. J. Gao, G. Wang, S. Ma et al., “CRISPR/Cas9-mediated targeted mutagenesis in Nicotiana tabacum,” Plant Molecular Biology, vol. 87, no. 1-2, pp. 99–110, 2015. View at: Publisher Site | Google Scholar
  24. L. Cong, F. A. Ran, D. Cox et al., “Multiplex genome engineering using CRISPR/Cas systems,” Science, vol. 339, no. 6121, pp. 819–823, 2013. View at: Publisher Site | Google Scholar
  25. H. Puchta, “Using CRISPR/Cas in three dimensions: towards synthetic plant genomes, transcriptomes and epigenomes,” Plant Journal, vol. 87, no. 1, pp. 5–15, 2016. View at: Publisher Site | Google Scholar
  26. T. B. Jacobs, P. R. LaFayette, R. J. Schmitz, and W. A. Parrott, “Targeted genome modifications in soybean with CRISPR/Cas9,” BMC Biotechnology, pp. 1–10, 2015. View at: Publisher Site | Google Scholar
  27. R. Xu, R. Qin, H. Li et al., “Generation of targeted mutant rice using a CRISPR-Cpf1 system,” Plant Biotechnology Journal, vol. 14, pp. 1–5, 2016. View at: Google Scholar
  28. W. Z. Jiang, I. M. Henry, P. G. Lynagh, L. Comai, E. B. Cahoon, and D. P. Weeks, “Significant enhancement of fatty acid composition in seeds of the allohexaploid, Camelina sativa, using CRISPR/Cas9 gene editing,” Plant Biotechnology Journal, vol. 15, no. 5, pp. 648–657, 2017. View at: Publisher Site | Google Scholar
  29. S. Soyk, N. A. Müller, S. J. Park et al., “Variation in the flowering gene SELF PRUNING 5G promotes day-neutrality and early yield in tomato,” Nature Genetics, vol. 49, no. 1, pp. 162–168, 2017. View at: Publisher Site | Google Scholar
  30. J. Peng, D. E. Richards, and N. M. Hartley, “Green revolution genes encode mutant gibberellin response modulators,” Nature, vol. 400, no. 6741, pp. 256–261, 1999. View at: Publisher Site | Google Scholar
  31. Y. Osakabe, T. Watanabe, S. S. Sugano et al., “Optimization of CRISPR/Cas9 genome editing to modify abiotic stress responses in plants,” Scientific Reports, vol. 6, Article ID 26685, 2016. View at: Publisher Site | Google Scholar
  32. Q. Shan, Y. Wang, and J. Li, “Targeted genome modification of crop plants using a CRISPR-Cas system,” Nature Biotechnol, vol. 31, pp. 686–688, 2013. View at: Publisher Site | Google Scholar
  33. S. Svitashev, J. K. Young, C. Schwartz, H. Gao, S. C. Falco, and A. M. Cigan, “Targeted mutagenesis, precise gene editing, and site-specific gene insertion in maize using Cas9 and guide RNA,” Plant Physiology, vol. 169, no. 2, pp. 931–945, 2015. View at: Publisher Site | Google Scholar
  34. Y. Wang, X. Cheng, and Q. Shan, “Simultaneous editing of three homoeoalleles in hexaploid bread wheat confers heritable resistance to powdery mildew,” Nature Biotechnol, vol. 32, pp. 947–952, 2014. View at: Publisher Site | Google Scholar
  35. X. Ji, H. Zhang, Y. Zhang, Y. Wang, and C. Gao, “Establishing a CRISPR-Cas-like immune system conferring DNA virus resistance in plants,” Nature Plants, vol. 1, article 15144, Article ID 15144, 2015. View at: Publisher Site | Google Scholar
  36. Z. Ali, A. Abulfaraj, A. Idris, S. Ali, M. Tashkandi, and M. M. Mahfouz, “CRISPR/Cas9-mediated viral interference in plants,” Genome Biology, vol. 16, no. 1, article 238, 2015. View at: Publisher Site | Google Scholar
  37. N. J. Baltes, A. W. Hummel, and E. Konecna, “Conferring resistance to geminiviruses with the CRISPR-Cas prokaryotic immune system,” Nature Plants, vol. 1, article 15145, 2015. View at: Google Scholar
  38. D. Liu, X. Chen, J. Liu, J. Ye, and Z. Guo, “The rice ERF transcription factor OsERF922 negatively regulates resistance to Magnaporthe oryzae and salt tolerance,” Journal of Experimental Botany, vol. 63, no. 10, pp. 3899–3912, 2012. View at: Publisher Site | Google Scholar
  39. F. Wang, C. Wang, P. Liu et al., “Enhanced rice blast resistance by CRISPR/ Cas9-Targeted mutagenesis of the ERF transcription factor gene OsERF922,” PLoS ONE, vol. 11, no. 4, Article ID e0154027, 2016. View at: Publisher Site | Google Scholar
  40. J. Chandrasekaran, M. Brumin, D. Wolf et al., “Development of broad virus resistance in non-transgenic cucumber using CRISPR/Cas9 technology,” Molecular Plant Pathology, vol. 17, no. 7, pp. 1140–1153, 2016. View at: Publisher Site | Google Scholar
  41. F. Zhang, Y. Wen, and X. Guo, “CRISPR/Cas9 for genome editing: Progress, implications and challenges,” Human Molecular Genetics, vol. 23, no. 1, pp. R40–R46, 2014. View at: Publisher Site | Google Scholar
  42. J. F. Petolino and J. P. Davies, “Designed transcriptional regulators for trait development,” Plant Science, vol. 201-202, no. 1, pp. 128–136, 2013. View at: Publisher Site | Google Scholar
  43. H. Wang, H. Yang, C. S. Shivalila et al., “One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering,” Cell, vol. 153, no. 4, pp. 910–918, 2013. View at: Publisher Site | Google Scholar
  44. L. Lowder, A. Malzahn, and Y. Qi, “Rapid evolution of manifold CRISPR systems for plant genome editing,” Frontiers in Plant Science, vol. 7, no. 2016, article no. 1683, 2016. View at: Publisher Site | Google Scholar
  45. D. G. Knorre and V. V. Vlasov, “Reactive derivatives of nucleic acids and their components as affinity reagents,” Russian Chemical Reviews, vol. 54, no. 9, pp. 836–851, 1985. View at: Publisher Site | Google Scholar
  46. N. J. Palpant and D. Dudzinski, “Zinc finger nucleases: Looking toward translation,” Gene Therapy, vol. 20, no. 2, pp. 121–127, 2013. View at: Publisher Site | Google Scholar
  47. R. Jankele and P. Svoboda, “TAL effectors: Tools for DNATargeting,” Briefings in Functional Genomics, vol. 13, no. 5, pp. 409–419, 2014. View at: Publisher Site | Google Scholar
  48. C. O. Pabo, E. Peisach, and R. A. Grant, “Design and selection of novel Cys2His2 zinc finger proteins,” Annual Review of Biochemistry, vol. 70, pp. 313–340, 2001. View at: Publisher Site | Google Scholar
  49. T. Cathomen and J. Keith Joung, “Zinc-finger nucleases: the next generation emerges,” Molecular Therapy, vol. 16, no. 7, pp. 1200–1207, 2008. View at: Publisher Site | Google Scholar
  50. J. F. Petolino, “Genome editing in plants via designed zinc finger nucleases,” In Vitro Cellular and Developmental Biology - Plant, vol. 51, no. 1, 2015. View at: Publisher Site | Google Scholar
  51. N. P. Pavletich and C. O. Pabo, “Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 Å,” Science, vol. 252, no. 5007, pp. 809–817, 1991. View at: Publisher Site | Google Scholar
  52. W. M. Ainley, L. Sastry-Dent, M. E. Welter et al., “Trait stacking via targeted genome editing,” Plant Biotechnology Journal, vol. 11, no. 9, pp. 1126–1134, 2013. View at: Publisher Site | Google Scholar
  53. J. F. Petolino, A. Worden, K. Curlee et al., “Zinc finger nuclease-mediated transgene deletion,” Plant Molecular Biology, vol. 73, no. 6, pp. 617–628, 2010. View at: Publisher Site | Google Scholar
  54. S. Schornack, A. Meyer, P. Römer, T. Jordan, and T. Lahaye, “Gene-for-gene-mediated recognition of nuclear-targeted AvrBs3-like bacterial effector proteins,” Journal of Plant Physiology, vol. 163, no. 3, pp. 256–272, 2006. View at: Publisher Site | Google Scholar
  55. P. Römer, S. Hahn, T. Jordan, T. Strauß, U. Bonas, and T. Lahaye, “Plant pathogen recognition mediated by promoter activation of the pepper Bs3 resistance gene,” Science, vol. 318, no. 5850, pp. 645–648, 2007. View at: Publisher Site | Google Scholar
  56. J. Boch, H. Scholze, S. Schornack et al., “Breaking the code of DNA binding specificity of TAL-type III effectors,” Science, vol. 326, no. 5959, pp. 1509–1512, 2009. View at: Publisher Site | Google Scholar
  57. B. M. Lamb, A. C. Mercer, and C. F. Barbas III, “Directed evolution of the TALE N-terminal domain for recognition of all 50 bases,” Nucleic Acids Research, vol. 41, no. 21, pp. 9779–9785, 2013. View at: Publisher Site | Google Scholar
  58. M. Christian, T. Cermak, E. L. Doyle et al., “Targeting DNA double-strand breaks with TAL effector nucleases,” Genetics, vol. 186, no. 2, pp. 757–761, 2010. View at: Publisher Site | Google Scholar
  59. L. Cong, R. H. Zhou, Y.-C. Kuo, M. Cunniff, and F. Zhang, “Comprehensive interrogation of natural TALE DNA-binding modules and transcriptional repressor domains,” Nature Communications, vol. 3, article 968, 2012. View at: Publisher Site | Google Scholar
  60. M. L. Christian, Z. L. Demorest, C. G. Starker et al., “Targeting G with TAL Effectors: A Comparison of Activities of TALENs Constructed with NN and NK Repeat Variable Di-Residues,” PLoS ONE, vol. 7, no. 9, Article ID e45383, 2012. View at: Publisher Site | Google Scholar
  61. J. Streubel, C. Bl࿌her, A. Landgraf, and J. Boch, “TAL effector RVD specificities and efficiencies,” Nature Biotechnology, vol. 30, no. 7, pp. 593–595, 2012. View at: Publisher Site | Google Scholar
  62. A. N.-S. Mak, P. Bradley, R. A. Cernadas, A. J. Bogdanove, and B. L. Stoddard, “The crystal structure of TAL effector PthXo1 bound to its DNA target,” Science, vol. 335, no. 6069, pp. 716–719, 2012. View at: Publisher Site | Google Scholar
  63. J. Xiong, J. Ding, and Y. Li, “Genome-editing technologies and their potential application in horticultural crop breeding,” Horticulture Research, vol. 2, article 15019, 2015. View at: Publisher Site | Google Scholar
  64. I. Y. Abdurakhmonov, “Genomics Era for Plants and Crop Species�vances Made and Needed Tasks Ahead,” in Plant Genomics, I. Abdurakhmonov, Ed., InTech, Croatia, Balkans, 2016. View at: Publisher Site | Google Scholar
  65. CropLife International, “Oligonucleotide-Directed Mutagenesis (ODM),” LJournal, 2017. View at: Publisher Site | Google Scholar
  66. R. Barrangou, C. Fremaux, and H. Deveau, “CRISPR provides acquired resistance against viruses in prokaryotes,” Science, vol. 315, no. 5819, pp. 1709–1712, 2007. View at: Publisher Site | Google Scholar
  67. E. Deltcheva, K. Chylinski, C. M. Sharma et al., “CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III,” Nature, vol. 471, no. 7340, pp. 602–607, 2011. View at: Publisher Site | Google Scholar
  68. D. H. Haft, J. Selengut, E. F. Mongodin, and K. E. Nelson, “A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/cas subtypes exist in prokaryotic genomes,” PLoS Computational Biology, vol. 1, article e60, no. 6, pp. 0474–0483, 2005. View at: Publisher Site | Google Scholar
  69. A. F. Gilles and M. Averof, “Functional genetics for all: Engineered nucleases, CRISPR and the gene editing revolution,” EvoDevo, vol. 5, no. 1, article no. 43, 2014. View at: Publisher Site | Google Scholar
  70. J. A. Doudna and E. Charpentier, “The new frontier of genome engineering with CRISPR-Cas9,” Science, vol. 346, no. 6213, 2014. View at: Publisher Site | Google Scholar
  71. D. B. Graham and D. E. Root, “Resources for the design of CRISPR gene editing experiments,” Genome Biology, vol. 16, no. 1, article no. 260, 2015. View at: Publisher Site | Google Scholar
  72. L. C. Perkin, S. L. Adrianos, and B. Oppert, “Gene disruption technologies have the potential to transform stored product insect pest control,” Insects, vol. 7, no. 3, article no. 46, 2016. View at: Publisher Site | Google Scholar
  73. P. Perez-Pinera, D. G. Ousterout, and C. A. Gersbach, “Advances in targeted genome editing,” Current Opinion in Chemical Biology, vol. 16, no. 3-4, pp. 268–277, 2012. View at: Publisher Site | Google Scholar
  74. L. Chen, L. Tang, H. Xiang et al., “Advances in genome editing technology and its promising application in evolutionary and ecological studies,” GigaScience, vol. 3, no. 1, article no. 24, 2014. View at: Publisher Site | Google Scholar
  75. C. Kissoudis, C. van de Wiel, R. G. F. Visser, and G. van der Linden, “Enhancing crop resilience to combined abiotic and biotic stress through the dissection of physiological and molecular crosstalk,” Frontiers in Plant Science, vol. 5, no. MAY, article no. 207, 2014. View at: Publisher Site | Google Scholar
  76. L. Liu and X.-D. Fan, “CRISPR-Cas system: A powerful tool for genome engineering,” Plant Molecular Biology, vol. 85, no. 3, pp. 209–218, 2014. View at: Publisher Site | Google Scholar
  77. M. Jain, “Function genomics of abiotic stress tolerance in plants: A CRISPR approach,” Frontiers in Plant Science, vol. 6, no. MAY, article no. 375, pp. 1–4, 2015. View at: Publisher Site | Google Scholar
  78. G. Andolfo, P. Iovieno, L. Frusciante, and M. R. Ercolano, “Genome-editing technologies for enhancing plant disease resistance,” Frontiers in Plant Science, vol. 7, no. 2016, article no. 1813, 2016. View at: Publisher Site | Google Scholar
  79. S. Khatodia, K. Bhatotia, N. Passricha, S. M. P. Khurana, and N. Tuteja, “The CRISPR/Cas genome-editing tool: Application in improvement of crops,” Frontiers in Plant Science, vol. 7, no. 2016, article no. 506, 2016. View at: Publisher Site | Google Scholar
  80. R. C. Nongpiur, S. L. Singla-Pareek, and A. Pareek, “Genomics Approaches for Improving Salinity Stress Tolerance in Crop Plants,” Current Genomics, vol. 17, no. 4, pp. 343–357, 2016. View at: Publisher Site | Google Scholar
  81. V. Shukla, M. Gupta, F. Urnov, D. Guschin, M. Jan, and P. Bundock, “Targeted modifcation of malate dehydrogenase, 2013,” WO Patent Publication Number: WO 2013166315 A1. View at: Google Scholar
  82. C. A. Hollender and C. Dardick, “Molecular basis of angiosperm tree architecture,” New Phytologist, vol. 206, no. 2, pp. 541–556, 2015. View at: Publisher Site | Google Scholar
  83. Y. Fang and B. M. Tyler, “Efficient disruption and replacement of an effector gene in the oomycete Phytophthora sojae using CRISPR/Cas9,” Molecular Plant Pathology, vol. 17, no. 1, pp. 127–139, 2016. View at: Publisher Site | Google Scholar
  84. G. E. Hastings and P. G. Wolf, “The Therapeutic Use of Albumin,” Archives of Family Medicine, vol. 1, no. 2, pp. 281–287, 1992. View at: Publisher Site | Google Scholar
  85. Y. He, T. Ning, T. Xie et al., “Large-scale production of functional human serum albumin from transgenic rice seeds,” Proceedings of the National Academy of Sciences of the United States of America, vol. 108, no. 47, pp. 19078–19083, 2011. View at: Publisher Site | Google Scholar
  86. M. Bosch and S. P. Hazen, “Lignocellulosic feedstocks: Research progress and challenges in optimizing biomass quality and yield,” Frontiers in Plant Science, vol. 4, article no. 474, 2013. View at: Publisher Site | Google Scholar
  87. C. M. Lee, T. J. Cradick, E. J. Fine, and G. Bao, “Nuclease target site selection for maximizing on-target activity and minimizing off-target effects in genome editing,” Molecular Therapy, vol. 24, no. 3, pp. 475–487, 2016. View at: Publisher Site | Google Scholar
  88. Q. С. Cai, J. Miller, F. Urnov et al., “Optimized non-canonical zinc finger proteins,” US Patent Number: 9,187,758. Publication date: Nov 17, 2015. View at: Google Scholar
  89. A. Lombardo, D. Cesana, P. Genovese et al., “Site-specific integration and tailoring of cassette design for sustainable gene transfer,” Nature Methods, vol. 8, no. 10, pp. 861–869, 2011. View at: Publisher Site | Google Scholar
  90. T. Koo, J. Lee, and J. Kim, “Measuring and reducing off-target activities of programmable nucleases including CRISPR-Cas9,” Molecules and Cells, vol. 38, no. 6, pp. 475–481, 2015. View at: Publisher Site | Google Scholar
  91. Y. Gao and Y. Zhao, “Specific and heritable gene editing in Arabidopsis,” Proceedings of the National Academy of Sciences of the United States of America, vol. 111, no. 12, pp. 4357-4358, 2014. View at: Publisher Site | Google Scholar
  92. C. Nagamangala Kanchiswamy, D. J. Sargent, R. Velasco, M. E. Maffei, and M. Malnoy, “Looking forward to genetically edited fruit crops,” Trends in Biotechnology, vol. 33, no. 2, pp. 62–64, 2015. View at: Publisher Site | Google Scholar
  93. R.-F. Xu, H. Li, R.-Y. Qin et al., “Generation of inheritable and "transgene clean" targeted genome-modified rice in later generations using the CRISPR/Cas9 system,” Scientific Reports, vol. 5, Article ID 11491, 2015. View at: Publisher Site | Google Scholar
  94. N. Podevin, Y. Devos, H. V. Davies, and K. M. Nielsen, “Transgenic or not? No simple answer! New biotechnology-based plant breeding techniques and the regulatory landscape,” EMBO Reports, vol. 13, no. 12, pp. 1057–1061, 2012. View at: Publisher Site | Google Scholar
  95. M. Araki and T. Ishii, “Towards social acceptance of plant breeding by genome editing,” Trends in Plant Science, vol. 20, no. 3, pp. 145–149, 2015. View at: Publisher Site | Google Scholar
  96. J. G. Schaart, C. C. M. van de Wiel, L. A. P. Lotz, and M. J. M. Smulders, “Opportunities for Products of New Plant Breeding Techniques,” Trends in Plant Science, vol. 21, no. 5, pp. 438–449, 2016. View at: Publisher Site | Google Scholar
  97. J. W. Woo, J. Kim, S. I. Kwon et al., “DNA-free genome editing in plants with preassembled CRISPR-Cas9 ribonucleoproteins,” Nature Biotechnology, vol. 33, no. 11, pp. 1162–1164, 2015. View at: Publisher Site | Google Scholar
  98. F. Hartung and J. Schiemann, “Precise plant breeding using new genome editing techniques: Opportunities, safety and regulation in the EU,” Plant Journal, vol. 78, no. 5, pp. 742–752, 2014. View at: Publisher Site | Google Scholar
  99. D. F. Voytas and C. Gao, “Precision genome engineering and agriculture: opportunities and regulatory challenges,” PLoS biology, vol. 12, no. 6, p. e1001877, 2014. View at: Publisher Site | Google Scholar
  100. H. D. Jones, “Regulatory uncertainty over genome editing,” Nature Plants, vol. 1, Article ID 14011, 2015. View at: Publisher Site | Google Scholar
  101. M. Lusser, C. Parisi, D. Plan, and E. Rodríguez-Cerezo, “Deployment of new biotechnologies in plant breeding,” Nature Biotechnology, vol. 30, no. 3, pp. 231–239, 2012. View at: Publisher Site | Google Scholar
  102. K. Belhaj, A. Chaparro-Garcia, S. Kamoun, N. J. Patron, and V. Nekrasov, “Editing plant genomes with CRISPR/Cas9,” Current Opinion in Biotechnology, vol. 32, pp. 76–84, 2015. View at: Publisher Site | Google Scholar
  103. J. D. Wolt, K. Wang, and B. Yang, “The regulatory status of genome-edited crops,” Plant Biotechnology Journal, vol. 14, no. 2, pp. 510–518, 2016. View at: Publisher Site | Google Scholar
  104. S. Huang, D. Weigel, R. N. Beachy, and J. Li, “A proposed regulatory framework for genome-edited crops,” Nature Genetics, vol. 48, no. 2, pp. 109–111, 2016. View at: Publisher Site | Google Scholar

Copyright

Copyright © 2017 Venera S. Kamburova et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Genome Editing in Plants: An Overview of Tools and Applications.

Since the advent of recombinant DNA technology in Paul Berg's laboratory [1] in 1972, genetic engineering has come a long way and achieved enormous success. Many molecular and genetic mechanisms and phenomena have been discovered and studied in detail and the knowledge accumulated now permits researchers to reproduce experiments in vitro. Several decades-long investigations in molecular genetics and biochemistry of bacteria and viruses have allowed researchers to develop new methods of manipulating DNA through creation of various vector systems and tools for their delivery into the cell. All of these developments allow successful creation of not only transgenic microorganisms but also genetically modified higher organisms including various plant and crop species. Creation of novel tools for breeding and biotechnology, an application area of genetic engineering, has received significant focus resulting in accelerated development of useful tools. However, conventional genetic engineering strategy has several issues and limitations, one of which is the complexity associated with the manipulation of large genomes of higher plants [2].

Currently, several tools that help to solve the problems of precise genome editing of plants are at scientists' disposal. In 1996, for the first time, it was shown that protein domains such as "zinc fingers" coupled with Fokl endonuclease domains act as site-specific nucleases (zinc finger nucleases (ZFNs)), which cleave the DNA in vitro in strictly defined regions [3]. Such a chimeric protein has a modular structure, because each of the "zinc finger" domains recognizes one triplet of nucleotides. This method became the basis for the editing of cultured cells, including model and nonmodel plants [4, 5].

Continued efforts and investigations led to the development of new genome editing tools such as TALENs (transcription activator-like effector nucleases) and CRISPR/Cas (clustered regularly interspaced short palindromic repeats). Designing TALENs requires reengineering of a new protein for each of the targets. However, the design process has been streamlined recently by making the modules of repeat combinations available that essentially reduces the cloning required for the design. On the other hand, designing and use of CRISPR are simple. Both TALEN and CRISPR systems have been shown to work in human cells, animals, and plants. Such editing systems when used for efficient manipulation of genomes could solve complex problems including the creation of mutant and transgenic plants [12, 41]. Moreover, chimeric proteins containing zinc finger domains and activation domains of other proteins and those based on the TALE DNA-binding domain and Cas9 nuclease were used in experiments for regulation of gene transcription, study of epigenomes, and the behavior of chromosome loci in cell cycle [24, 42-44].

In this review, we briefly described the mechanisms of different genome editing systems and their use for crop improvement and also highlighted the multiple advantages and applications of engineered nucleases as well as biosafety and regulatory aspects of plants generated using engineered nuclease based technologies.

2. Mechanisms of Genome Editing Systems

Novel genome editing tools, also referred to as genome editing with engineered nuclease (GEEN) technologies, allow cleavage and rejoining of DNA molecules in specified sites to successfully modify the hereditary material of cells. To this end, special enzymes such as restriction endonucleases and ligase can be used for cleaving and rejoining of DNA molecules in small genomes like bacterial and viral genomes. However, using restriction endonucleases and ligases, it is extremely difficult to manipulate large and complex genomes of higher organisms, including plant genomes. The problem is that the restriction endonucleases can only "target" relatively short DNA sequences. While such specificity is enough for short DNA viruses and bacteria, it is not sufficient to work with large plant genomes. The first efforts to create methods for the editing of complex genomes were associated with the designing of "artificial enzymes" as oligonucleotides (short nucleotide sequences) that could selectively bind to specific sequences in the structure of the target DNA and have chemical groups capable of cleaving DNA [45].

Targeted approach to address this challenge was the design of chimeric nucleases which are complex proteins containing one or two structural units, one of which catalyzes the cleavage of DNA, and the second is capable of selectively binding to specific nucleotide sequences of target molecule, providing the nuclease action to this site (Table 1) [46, 47]. These chimeric nucleases can be "produced" directly in the cell: to this end, appropriately engineered vectors encoding nucleases need to be introduced into cell. Such vectors are also supplied with nuclear localization signal which enables the nuclease to enter the cell nucleus thereby getting access to genomic DNA.

2.1. Zinc Finger Nucleases (ZFNs). ZFNs were the first generation of genome editing tools that use chimerically engineered nucleases which were developed after the discovery of the working principles of the functional Cys2-His2 zinc finger (ZF) domain [3, 4, 46, 48]. Each Cys2-His2 ZF domain consists of 30 amino acid residues, which are folded up to [beta][beta][alpha] configuration [48-50]. Crystallographic structure analysis showed that the Cys2-His2 ZF proteins bind to DNA by inserting an a-helix of the protein into the major groove of the DNA-double helix [51]. Each ZF protein has the ability to recognize 3 tandem nucleotides in the DNA. Generalized ZFN monomer consists of two different functional domains: artificial ZF Cys2-His2 domain at the N-terminal region and a nonspecific Fokl DNA cleavage domain at the C-terminal region. Fokl domain dimerization is critical for ZFN enzymatic activity [3]. The observation that the modular recognition of zinc finger domains presents as a series to the corresponding, consecutive three bp targets enabled the realization that each of the individual zinc finger domains could be interchangeable and that the manipulation of the order of the domains would lead to unique binding specificities to the proteins harboring them thereby enabling targeting of specific, unique sequences in the genome. For example, a ZFN dimer, consisting of two 3 or 4 ZF domains,

recognizes a target sequence of 18 or 24 base pairs, which statistically form unique sites in the genomes of most organisms (Table 1).

The design and application of ZFNs involve modular design, assembly, and optimization of zinc fingers against specific target DNA sequences followed by linking of individual ZFs towards targeting larger sequences. Over the years, zinc finger domains have been generated to recognize a large number of triplet nucleotides. This enabled the selection and linking of zinc fingers in a sequence that would permit recognition of the target sequence of interest.

Since the first report on zinc fingers in 1996, they have been successfully used in several organisms including plants [4]. Examples include targeted inactivation of endogenous genes in Arabidopsis [15, 16], high frequency modification of tobacco genes [17], and precise targeted addition of a herbicide-tolerance gene as well as insertional disruption of a target locus in maize [18]. ZFNs have also been used for trait stacking in maize [52, 53].

Zinc finger nucleases have revolutionized the field of genome editing by demonstrating the ability to manipulate genomic sites of interest and opened the gates for both basic and applied research. ZFNs provide advantages over other tools with respect to efficiency, high specificity, and minimal nontarget effects and current efforts are focused on further improving design and delivery as well as expanding their applications in diverse crops of interest.

2.2. Transcription Activator-Like Effector Nucleases (TALENs). The quest for efficient and selective manipulation of target genomic DNA led to the identification of unique transcription activator-like effector (TALE) proteins that recognize and activate specific plant promoters through a set of tandem repeats that formed the basis for the creation of a new genome editing system consisting of chimeric nucleases called TALE nucleases (TALENs) [47]. TALE proteins consist of a central domain responsible for DNA binding, nuclear localization signal, and a domain that serves as activator of transcription of the target gene (Table 1) [54]. For the first time, the DNA-binding ability of these proteins was described in 2007 [55], and a year later, two scientific groups have decoded the recognition code of target DNA sequence by TALE proteins [56].

It is shown that the DNA-binding domain in TALE monomers in turn consists of a central repeat domain (CRD) that confers DNA binding and host specificity. The CRD consists of tandem repeats of 34 amino acid residues and each 34-amino acid long repeat in the CRD binds to one nucleotide in the target nucleotide sequence. Two of the amino acids of the repeat, located at positions 12 and 13, are highly variable (repeat variable diresidue (RVD)) and are responsible for the recognition of specific nucleotide with degeneracy of binding several nucleotides with differential efficiency. The last tandem repeat binding to nucleotide at the 3'-end of the recognition site consists of 20 amino acid residues only and, therefore, it is named as half-repeat. While TALE proteins, in general, can be designed to bind any DNA sequence of interest, studies have demonstrated that the 5'-most nucleotide base of the DNA sequence bound by a TALE protein should always be a Thymidine and that a deviation from this requirement can affect the efficacy of TALE transcription factors (TALE-TF), TALE recombinases (TALE-R), and TALENs [57].

After the DNA code recognition requirements by TALE proteins have been cracked, the very first effort undertaken was the creation of chimeric TALEN nucleases [5]. For this purpose, the sequence encoding the DNA-binding TALE domain was inserted into a plasmid vector previously used to create ZFN [58]. This resulted in the creation of a synthetic, chimeric sequence-specific nuclease genetic construct containing the DNA-binding domain of TALEs and the catalytic domain of FokI restriction endonuclease. This construct helped to create artificial nucleases with DNA-binding domain and different RVDs that can target any nucleotide sequence of interest [2, 4].

In most studies, the monomers with RVDs Asn and Ile (NI), Asn and Gly (NG), two Asn (NN), and His and Asp (HD) bind to nucleotides A, T, G, and C, respectively. NN, the most common RVD that specifies G, was also found to bind to A. This suboptimal or lack of specificity is a concern for the use of engineered TALEs for targeting DNA. Another RVD NK has less functional efficiency compared to NN, although it has demonstrated guanine specificity. Several studies have also shown that the use of NH or NK RVDs for specific binding of guanine reduces the risk of non-target effects [19, 59, 60]. It has been shown that in RVD (NI, NG, NN, or HD) the first amino acid residue, whether it is N or H, is responsible for the stabilization of spatial conformation although it does not directly bind to a nucleotide, whereas the second amino acid residue binds to a nucleotide either through hydrogen bonding with nitrogenous bases (in case of D and N amino acids) or through van der Waals forces (in case of I and G) [61].

Based on the mode of action and specificity of TALENs, it should be possible to introduce double strand breaks in any location of the genome as long as that location harbors the recognition sequence corresponding to the DNA-binding domains of TALENs. There is another condition that also needs to be met, that is, the requirement of the presence of Thymidine before the 5' end of the intended target sequence since it has been demonstrated that the W232 residue in the N-terminal portion of the DNA-binding domain interacts with the Thymidine and influences the binding efficiency [62]. It is also possible to overcome this 5' Thymidine constraint by developing mutant variants of TALEN N-terminal domain which can bind other nucleotides [57]. Considering the ease of site-directed manipulation using TALEN system, within a short period of time after the unraveling of the TALEN mode of action, the genes modified by this system have been used successfully in several animal and plant species and the plant examples include rice, wheat, Arabidopsis, potato, and tomato (Table 2) [63].

2.3. Oligonucleotide-Directed Mutagenesis (ODM). After first successful exploitation in mammalian systems, oligonucleotide-directed mutagenesis (ODM) has become another novel gene editing tool for plants [7, 64]. ODM, a tool for targeted mutagenesis, uses a specific 20- to 100-base long oligonucleotide, the sequence of which is identical to the target sequence in the genome except that it contains a single base pair change (intended mutation to be inserted in the genome) towards achieving site-directed editing of gene/sequence of interest (Table 1) [65]. When these synthetic oligonucleotides or repair templates with homology to a specific region of the target gene are transiently exposed to the plant cells by using a variety of specific delivery methods, they bind to the targets and activate cell's natural repair machinery which recognizes the single mismatch in the template and then copies that mismatch or mutation into the target sequence through repair process [7, 65]. This produces the desired targeted single nucleotide or base editing in the plant genome that confers novel function or trait while the plant cell degrades the repair template oligonucleotide. Using tissue culture methods, cells with edited sequences are subsequently regenerated and genome edited novel varieties with improved traits/characteristics are developed through traditional breeding (Table 2) [7, 64, 65].

2.4. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR). Another novel genome editing system that has emerged recently and has become widely popular is the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR associated (Cas) protein system with the most prominent being the CRISPR/Cas9 (based on Cas9 protein). This is a method that utilizes adaptive bacterial and archaeal immune system, the mechanism of which relies on the presence of special sites in the bacterial genome called CRISPR loci. These loci are composed of operons encoding the Cas9 protein and a repeated array of repeat spacer sequences. The spacers in the repeat array are short fragments that are derived from foreign DNA (viral or plasmid) that have become integrated into bacterial genome following recombination [41, 66].

Unlike the chimeric TALEN proteins, target site recognition by CRISPR/Cas9 system is accomplished by the complementary sequence based interaction between the guide (noncoding) RNA and DNA of the target site and the guide RNA and Cas protein complex has the nuclease activity for exact cleavage of double-stranded DNA using Cas9 endonuclease (Table 1) [9, 24, 67].

Several types of CRISPR protective systems functioning in cells of various bacteria are described in detail elsewhere [68, 69]. The most "popular" system is the CRISPR/Cas type II-A system found in the bacterium Streptococcus pyogenes and composed of three genes encoding CRISPR RNA (crRNA), trans-activating crRNA (tracrRNA), and Cas9 protein. Based on this system, universal genetic constructs encoding artificial elements of CRISPR/Cas "genome editor" have been created [70]. Also, a simplified version of the system, functioning as a complex of Cas9 protein and a single guide RNA, consisting of CRISPR tracrRNA and short, mature crRNA was created. The guide sequence identifies the target DNA site and binds to it based on complementarity and Cas9 cleaves the DNA in target point [71].

CRISPR system can be used for the creation of genetically modified cells grown in culture and living organisms [11]. In the first case, plasmids or viral vectors which provide high and stable synthesis of CRISPR/Cas9 system elements are introduced into cells. In the second case, cultured protoplasts and a plasmid coding CRISPR/Cas elements are used to obtain genetically modified plants [32]. Another approach, applied for plants, is the use of Agrobacterium, the natural "genetic engineer," that contains a special plasmid harboring CRISPR/Cas9 system [41, 44].

Thus, due to its simplicity, efficiency, and wide capabilities, in a short time CRISPR/Cas9 system has already found use in various fields of fundamental and applied biology, biotechnology, and genetic engineering.

2.5. Repair of Cleaved Genomic Sites. An important step in the genome editing process is the repair of the DNA break created by the nucleases. DNA break gets repaired by the endogenous cellular mechanisms: nonhomologous end-joining (NHEJ) or homology-dependent (or directed) repair (HDR) [14]. NHEJ is the simplest mechanism where the ends of the cleaved DNA are joined together, often resulting in the insertion or deletion of nucleotides (indels) thereby shifting the gene reading frame, resulting in a gene "knockout" [72]. If indels are not observed, the DNA is recovered, and there are no noticeable changes. On the other hand, HDR is a mechanism where a sequence containing homology to target is used as a template for repairing the break or the DNA lesion. Therefore, by providing a template that contains a desired sequence of interest flanked by sequences homologous to both sides of the break point, one can force the insertion of that desired sequence into the target site. When HDR occurs, a homologous recombination is used to enable new sequences for gene recovery or insertion [72]. This method is simple, provides the exact impact on DNA target, and can be used at almost any modern molecular biology laboratory.

3. Practical Applications of Genome Editing Systems

3.1. Application of "Genome Editors" for Functional Genomics. Several different types of genome modifications can be achieved by utilizing ZFN, TALEN, ODM, and CRISPR/Cas genome editing systems (Table 2). These include creation of point mutations, insertion of new genes in specific locations or deletion of large regions of the nucleotide sequences, and correction or substitution of individual genetic elements and gene fragments [4, 6,10, 20, 23, 44, 73].

While introducing modifications to various genomic elements in plant cells and examining the results, scientists were able to investigate the role of individual genes in the functioning of individual cells and the organism as a whole. For example, the unique ability of CRISPR/Cas9 system to selectively bind to specific DNA sites has helped to regulate gene activity [24,41,44]. For this purpose, proteins activating or repressing the activity of promoters that control the gene function can be attached to the catalytically inactive mutant Cas9 protein. In one example, it was shown that complex binding to the target DNA can inhibit or stimulate the function of the target gene [44].

Furthermore, using CRISPR/Cas9 system, several genetic constructs targeted to different genome sites can simultaneously be introduced into cells [8, 24, 43]. This is a welcome feature in investigating intergenic interaction, if any, because several genes are simultaneously affected by the CRISPR/Cas9 system [44]. For example, using this approach, it was possible to identify genes involved in crop domestication process [74].

3.2. Application of Genome Editing Systems in Crop Improvement. Genome editing technologies have wide practical applications for solving one of the most important tasks of modern biotechnology--the creation of new varieties of crops, which are high-yielding and resistant to abiotic and biotic stresses and also have high nutritional value (Table 2) [31, 63, 75-80]. To this end, genome editing system has been used in plant breeding (1) to insert point mutations similar to natural SNPs [26,27], (2) to make small modifications to gene function [13], (3) for integration of foreign genes, (4) for gene pyramiding and knockout, and (5) for the repression or activation of gene expression, as well as (6) epigenetic editing [6].

For example, the use of ZFN in Arabidopsis thaliana [1517] and Zea mays [18] has led to the successful development of herbicide tolerant genotypes through insertion of herbicide-resistance genes into targeted sites in the genome [18]. ZFN was also used for the targeted modification of an endogenous malate dehydrogenase (MDH) gene in plants and the plants containing modified MDH have shown increased yield [81]. ODM technique has been significantly advanced through Cibus Rapid Trait Development System (RTDS) [7] and this technology has been successfully applied in several crops. Applications include but are not limited to the generation of herbicide tolerance, insect resistance, enhanced disease resistance (bacterial and viral), improved nutritional value, and enhanced yield without the introduction of foreign genes as has been used in traditional genetic engineering approach for crop development [7, 65]. A precise editing of CAC to TAC using ODM RTDS technology has been demonstrated that converts BEP to GFP by changing Histidine (H66) to Tyrosine (Y66) in GFP protein. This approach has offered a nontransgenic breeding tool for crops [7, 64].

Using the CRISPR/Cas9 technology, Jiang et al. [28] have obtained "a biotech" oil from Camelina sativa seeds with an improved fatty acid composition, which makes it more beneficial to human health, more resistant to oxidation, and more appropriate for the production of certain commercial chemicals including biofuels [28]. Soyk et al. [29] used targeted mutagenesis of SP5G gene of tomato to create plants with rapid flowering and more compact bush, which in turn resulted in earlier harvest. In another effort, Osakabe et al. [31], using the CRISPR-induced mutagenesis of OST2 gene in Arabidopsis, were able to obtain new alleles that confer salt stress resistance to plants [31].

Modulation of the gibberellin biosynthesis by genome editing methods has allowed creation of dwarf fruit trees [30], which have great potential for increasing productivity through higher density plantings and reduced labor costs. This results in a reduction of land, water, pesticide, and fertilizer use [82]. In addition, genome editing for inhibition of ethylene biosynthesis, which plays a very important role in fruit ripening process [82] or its signaling pathways, enables creation of new varieties with extended shelf life [63].

A major area of application of genome editing approaches in plant breeding is to create varieties resistant to various pathogens and/or pests. These methods have been used for the modification of the key plant immunity stages at different levels in several crops. This goal can be achieved by modifying (1) susceptibility genes (S-genes), (2) resistance genes (R-genes), (3) genes regulating the interaction between the effector and target, and (4) the genes regulating plant hormonal balance [78]. For example, wheat genotypes resistant to powdery mildew disease were obtained by TALENand CRISPR/Cas9-mediated genome editing on mildew-resistance locus O (MLO) [34]. Genome editing technologies have also been used to produce plants resistant to bacterial leaf blight, caused by Xanthomonas oryzae pv. oryzae [21].

The CRISPR/Cas9 system has been investigated for its efficacy in providing interference against geminiviruses by using a transient transformation system such that N. benthamiana degradation/suppression of curly top virus genome by single guide RNA/Cas9 (sgRNA/Cas9) has been demonstrated [35]. In other efforts, where sgRNAs specific for tomato yellow leaf curl virus (TYLCV) or bean yellow dwarf virus (BeYDV) sequences were introduced into N. benthamiana plants expressing Cas9 endonuclease and challenged with the corresponding viruses, it was demonstrated that the CRISPR/Cas9 system not only targeted viruses for degradation but also introduced mutations at the target sequences [36, 37] due to interference with the copy number of freely replicating viruses [78].

Metabolic pathways that regulate hormonal balance can also be modified using the genome editing technologies to enhance the immunomodulatory component of the plants immune system. This can be achieved by deactivating the ethylene-responsive factor (ERF). In particular, ethylene-dependent pathway in rice has been successfully modified by CRISPR/Cas9-mediated target OsERF922 gene mutations, resulting in increased resistance to Magnaporthe oryzae [38, 39].

CRISPR/Cas9 has been used to knock out eIF4E gene that encodes the eukaryotic translation initiation factor essential for translation of viruses, in Cucumis sativus, and that knockout confers resistance to viruses such as cucumber vein yellowing virus (CVYV), zucchini yellow mosaic virus (ZYMV), and papaya ring spot mosaic virus-W (PRSV-W) [83]. In addition, CRISPR/Cas9 was demonstrated to be an efficient system for rapid and efficient genome editing in Phytophthora sojae, an oomycete pathogen of Soybean, by modifying the pathogenicity gene (Avr4/6), thereby opening up an avenue for the much needed functional genomics work in Phytophthora sojae towards the ultimate goal of controlling this pathogen [83].

Similarly, existing genome editing methods, in particular, CRISPR/Cas9 method, have been successfully used to obtain plants resistant to herbicides [33]. For example, editing of ALS2 gene in maize (acetolactate synthase or ALS is a key enzyme in the biosynthesis of amino acids in plants and has been inhibited by sulfonylurea herbicides) allowed the creation of a mutant corn plant resistant to chlorsulfuron [33].

Another interesting area of biotechnology where CRISPR/Cas9 system has significant application is the development of plants capable of synthesizing human proteins such as insulin, necessary for patients with diabetes mellitus, or albumin, which is used in the treatment of hemorrhagic shock, burns, hypoproteinemia, and cirrhosis [84]. At present, albumin is prepared from human plasma which is in a very limited supply however, global demand for albumin is constantly growing and currently is equal to 500 tons per year. To meet the growing needs human albumin gene is already introduced into rice genome using genomic engineering techniques [85]. Such expressed proteins can be isolated from plant and animal tissues, where it is synthesized, and after clarification, it can be used for medical purposes.

Thus, as described above and extensively referenced herein, these novel genome editing techniques are being widely used for the purpose of crop improvement including new bioenergy crop developments [86]. However, the use of tissue culture with these GEEN methods may also create complexities that could slow the process of genome editing.

4. Safety Assessment Aspects of Genome Editing Systems

4.1. Nontarget Effects. Genome editing techniques, in essence, preserve the native genomic structure and, therefore, are considered as a safe technology for crop improvement. Despite this general understanding, there are some concerns related to the biosafety of crops created using these methods. One main concern in terms of its biosafety is the possibility of nontarget effects of synthetic nucleases during genome editing.

During the biotechnological application of genome editing methods, efficiency and specificity of the engineered nucleases are the two most important functional requirements and are closely related to the choice of the target site. For each endogenous genomic locus, efficiency of DNA cleavage (both target and nontarget) depends not only on the nuclease activity (such as FokI domains and Ruv domains of the Cas9 proteins), but also on the availability of a target site and affinity of the DNA-binding domain (e.g., TAL effector domains and guide RNA, gRNA) to the target sequence. Specificity of engineered nucleases largely depends on the binding affinity of nuclease-DNA, including the binding of zinc finger to DNA (ZFNs), TAL effector to DNA (TALENs), and hybridization of gRNA with DNA (CRISPR), although dimerization of FokI domain (ZFNs and TALENs) and Cas9 interaction with the motif contiguous to protospacer adjacent motif (PAM) may also play an important role [87]. In case of ZFNs, while examples abound with respect to the binding efficiency of canonical C2H2 binding domain containing ZFNs, investigations on the utility of noncanonical ZFNs such as those containing C3H1 binding domain have demonstrated high levels of binding efficiency [88].

To minimize nontarget effects of genome editing systems, a crucial aspect is the careful selection of sites for the introduction of the double-stranded breaks by performing a prior bioinformatics analysis [89]. When choosing the desired sites, sites of repeated sequences and sites having a high homology with other regions of the genome should be avoided. In this regard, to facilitate the selection of the target sites for nucleases and experimental verification of the presence of nontarget effects, several software packages were developed that enable nuclease design and validation [79, 87, 90].

4.2. Regulation of Plants Created by Genome Editing. The novel genome editing systems help to introduce stably inherited point modifications into the plant genome, and transgenic region can be easily removed after editing a target gene. This allows creation of nontransgenic plants and improved crop varieties [22, 91-93]. These technologies are faster compared to traditional breeding methods and help to obtain the null segregant lines that have lost the transgene insertion [9497]. Plants with targeted mutations developed by genome editing technology are nearly identical to plants obtained by classical breeding, and their safety must be assessed taking into account the resulting product rather than the process used to create them [98-100]. In this context, ODM-derived products are in many cases indistinguishable from conventionally bred or traditional mutagenesis products therefore, such products should not be regulated in the same way as the products generated by genetic engineering methods [7, 65]. Using CRISPR-Cas9 system, it becomes possible to obtain marker-free genetically engineered crops, that is, without marker genes of antibiotic resistance [6, 100]. Thus, in the case of new varieties with targeted mutations, developed using genome editing systems, the existing operating rules for the regulation of genetically modified plants should not be applied [92, 95, 99,100]. Currently genome editing technologies are being discussed by various advisory and regulatory authorities in the context of GMO legislation. Cultures and plants obtained using genome editing techniques are considered as nongenetically modified [95, 99,101]. The European Commission is expected to publish a report on regulatory uncertainty of genome editing methods [100,102,103].

5. Multitude of Advantages and Perspectives of GEENs

Tools of genome editing have a significant impact on basic and applied research in plant biology [24, 43, 44, 73]. The simplified approach to gene/genome editing represents a valuable tool for plant researchers in functional analysis of gene(s) and for breeders in the integration of key genes in the genomes of agriculturally important crops. Genome editing systems have several attractive features including simplicity, efficiency, high specificity, minimal nontarget effects, and amenability to multiplexing and thus are very promising for use in plant breeding [6].

Site-directed mutagenesis of different genes can provide important information about their functions. Simultaneous targeting of multiple genes/loci by applying multiplex strategies can promote research to identify the role of individual genes in the intracellular signaling pathways and aid in the engineering of complex, multigenic agronomic traits in crops. The preferred use of CRISPR-Cas9 system can be exemplified in completely knockout gene function [6, 64], microRNA knockdown screening [6], and programmed editing of certain loci by genome editing systems that can provide a functional separation of cis- and trans-regulatory elements/factors with high accuracy [6]. Another prospective application of CRISPR-Cas9 system may be its use in the formation of conditional alleles, providing spatial and temporal control of gene expression to study the function of lethal genes. Use of inducible or tissue-specific promoters for expression of Cas9 and/or sgRNA can be instrumental for gene expression regulation in a specific tissue, in development stage, or in different environmental conditions [6].

CRISPR-Cas system opens up wide possibilities for labeling endogenous genes with fluorescent proteins to visualize their expression in vivo. Using fluorescent labeled dCas9, changes of genome dynamics/chromosome architectural changes during plant development and their response to environmental stimuli can be learned. These technologies can also be used for the selection of the specific cell types that greatly facilitate the study of various functional aspects [6]. Use of dCas9 can provide a new platform for the selection of activation/repression effector domains to specific genomic loci for regulating endogenous gene expression.

In addition, these technologies can be successfully used in the work on epigenome editing via the selection of proteins responsible for histone modification and DNA methylation, which has emerged as a new way of regulating cellular functions in plants [25]. For the purpose of understanding epigenetic regulation, CRISPR-Cas9 system can also be used for the enrichment of chromatin target sites for the identification of proteins attached to enriched chromatin. Likewise, CRISPR-Cas9 can be used as a tool to identify regulatory proteins binding to specific DNA sequences controlling the expression of genes.

Genome editing tools are becoming popular molecular tools of choice for functional genomics as well as crop improvement. Many examples exist currently where these editing systems are being harnessed for unprecedented understanding of plant biology and crop yield improvement through rapid and targeted mutagenesis and associated breeding [102, 104]. Because of their several attractive features such as simplicity, efficiency, high specificity, and amenability to multiplexing, genome editing technologies described here are revolutionizing the way crop breeding is done and paving the way for the next generation breeding.

The authors declare that there are no conflicts of interest regarding the publication of this article.

The authors thank Academy of Sciences of Uzbekistan and Science and Technology Agency of Uzbekistan for Research Grants nos. FA-F5-021 and FA-F5-025.

[1] M. F. Singer, "Introduction and historical background," in Genetic Engineering, J. K. Setlow and A. Hollaender, Eds., vol. 1, pp. 1-13, Plenum, New York, NY, USA, 1979.

[2] A. A. Nemudryi, K. R. Valetdinova, S. P. Medvedev, and S. M. Zakian, "TALEN and CRISPR/Cas genome editing systems: tools of discovery," Acta Naturae, vol. 6, no. 22, pp. 19-40, 2014.

[3] Y.-G. Kim, J. Cha, and S. Chandrasegaran, "Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain," Proceedings of the National Academy of Sciences of the United States of America, vol. 93, no. 3, pp. 1156-1160, 1996.

[4] T. Gaj, C. A. Gersbach, and C. F. Barbas III, "ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering," Trends in Biotechnology, vol. 31, no. 7, pp. 397-405, 2013.

[5] D. P. Weeks, M. H. Spalding, and B. Yang, "Use of designer nucleases for targeted gene and genome editing in plants," Plant Biotechnology Journal, vol. 14, no. 2, pp. 483-495, 2016.

[6] V. Kumar and M. Jain, "The CRISPR-Cas system for plant genome editing: Advances and opportunities," Journal of Experimental Botany, vol. 66, no. 1, pp. 47-57, 2015.

[7] N. J. Sauer, J. Mozoruk, R. B. Miller et al., "Oligonucleotide-directed mutagenesis for precision gene editing," Plant Biotechnology Journal, vol. 14, no. 2, pp. 496-502, 2016.

[8] J. F. Li, J. E. Norville, and J. Aach, "Multiplex and homologous recombination-mediated genome editing in Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9," Nature Biotechnolog, vol. 31, no. 8, pp. 688-691, 2013.

[9] M. Jinek, K. Chylinski, I. Fonfara, M. Hauer, J. A. Doudna, and E. Charpentier, "A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity," Science, vol. 337, no. 6096, pp. 816-821, 2012.

[10] B. Chen, J. Hu, R. Almeida et al., "Expanding the CRISPR imaging toolset with Staphylococcus aureus Cas9 for simultaneous imaging of multiple genomic loci," Nucleic Acids Research, vol. 44, no. 8, p. e75, 2016.

[11] S. W. Cho, S. Kim, J. M. Kim, and J.-S. Kim, "Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease," Nature Biotechnology, vol. 31, no. 3, pp. 230-232, 2013.

[12] A. Noman, M. Aqeel, and S. He, "CRISPR-Cas9: Tool for qualitative and quantitative plant genome editing," Frontiers in Plant Science, vol. 7, no. 2016, article no. 1740, 2016.

[13] Y. Mao, H. Zhang, N. Xu, B. Zhang, F. Gou, and J.-K. Zhu, "Application of the CRISPR-Cas system for efficient genome engineering in plants," Molecular Plant, vol. 6, no. 6, pp. 2008-2011, 2013.

[14] P. D. Hsu, D. A. Scott, J. A. Weinstein et al., "DNA targeting specificity of RNA-guided Cas9 nucleases," Nature Biotechnol, vol. 31, pp. 827-832, 2013.

[15] K. Osakabe, Y. Osakabe, and S. Toki, "Site-directed mutagenesis in Arabidopsis using custom-designed zinc finger nucleases," Proceedings of the National Academy of Sciences of the United States of America, vol. 107, no. 26, pp. 12034-12039, 2010.

[16] F. Zhang, M. L. Maeder, E. Unger-Wallaced et al., "High frequency targeted mutagenesis in Arabidopsis thaliana using zinc finger nucleases," Proceedings of the National Academy of Sciences of the United States of America, vol. 107, no. 26, pp. 12028-12033, 2010.

[17] J. A. Townsend, D. A. Wright, R. J. Winfrey et al., "High-frequency modification of plant genes using engineered zinc-finger nucleases," Nature, vol. 459, no. 7245, pp. 442-445, 2009.

[18] V. K. Shukla, Y. Doyon, J. C. Miller et al., "Precise genome modification in the crop species Zea mays using zinc-finger nucleases," Nature, vol. 459, pp. 437-441, 2009.

[19] M. Christian, Y. Qi, Y. Zhang, and D. F. Voytas, "Targeted Mutagenesis of Arabidopsis thaliana Using Engineered TAL Effector Nucleases," G3: Genes, Genomes, Genetics, vol. 3, no. 9, pp. 1697-1705, 2013.

[20] H. Zhang, F. Gou, J. Zhang et al., "TALEN-mediated targeted mutagenesis produces a large variety of heritable mutations in rice," Plant Biotechnology Journal, vol. 14, no. 1, pp. 186-194, 2016.

[21] T. Li, B. Liu, M. H. Spalding, D. P. Weeks, and B. Yang, "High-efficiency TALEN-based gene editing produces disease-resistant rice," Nature Biotechnology, vol. 30, no. 5, pp. 390-392, 2012.

[22] M. M. Mahfouz, L. Li, M. Piatek et al., "Targeted transcriptional repression using a chimeric TALE-SRDX repressor protein," Plant Molecular Biology, vol. 78, no. 3, pp. 311-321, 2012.

[23] J. Gao, G. Wang, S. Ma et al., "CRISPR/Cas9-mediated targeted mutagenesis in Nicotiana tabacum," Plant Molecular Biology, vol. 87, no. 1-2, pp. 99-110, 2015.

[24] L. Cong, F. A. Ran, D. Cox et al., "Multiplex genome engineering using CRISPR/Cas systems," Science, vol. 339, no. 6121, pp. 819-823, 2013.

[25] H. Puchta, "Using CRISPR/Cas in three dimensions: towards synthetic plant genomes, transcriptomes and epigenomes," Plant Journal, vol. 87, no. 1, pp. 5-15, 2016.

[26] T. B. Jacobs, P. R. LaFayette, R. J. Schmitz, and W. A. Parrott, "Targeted genome modifications in soybean with CRISPR/Cas9," BMC Biotechnology, pp. 1-10, 2015.

[27] R. Xu, R. Qin, H. Li et al., "Generation of targeted mutant rice using a CRISPR-Cpf1 system," Plant Biotechnology Journal, vol. 14, pp. 1-5, 2016.

[28] W. Z. Jiang, I. M. Henry, P. G. Lynagh, L. Comai, E. B. Cahoon, and D. P. Weeks, "Significant enhancement of fatty acid composition in seeds of the allohexaploid, Camelina sativa, using CRISPR/Cas9 gene editing," Plant Biotechnology Journal, vol. 15, no. 5, pp. 648-657, 2017.

[29] S. Soyk, N. A. Muller, S. J. Park et al., "Variation in the flowering gene SELF PRUNING 5G promotes day-neutrality and early yield in tomato," Nature Genetics, vol. 49, no. 1, pp. 162-168, 2017.

[30] J. Peng, D. E. Richards, and N. M. Hartley, "Green revolution genes encode mutant gibberellin response modulators," Nature, vol. 400, no. 6741, pp. 256-261, 1999.

[31] Y. Osakabe, T. Watanabe, S. S. Sugano et al., "Optimization of CRISPR/Cas9 genome editing to modify abiotic stress responses in plants," Scientific Reports, vol. 6, Article ID 26685, 2016.

[32] Q. Shan, Y. Wang, and J. Li, "Targeted genome modification of crop plants using a CRISPR-Cas system," Nature Biotechnol, vol. 31, pp. 686-688, 2013.

[33] S. Svitashev, J. K. Young, C. Schwartz, H. Gao, S. C. Falco, and A. M. Cigan, "Targeted mutagenesis, precise gene editing, and site-specific gene insertion in maize using Cas9 and guide RNA," Plant Physiology, vol. 169, no. 2, pp. 931-945, 2015.

[34] Y. Wang, X. Cheng, and Q. Shan, "Simultaneous editing of three homoeoalleles in hexaploid bread wheat confers heritable resistance to powdery mildew," Nature Biotechnol, vol. 32, pp. 947-952, 2014.

[35] X. Ji, H. Zhang, Y. Zhang, Y. Wang, and C. Gao, "Establishing a CRISPR-Cas-like immune system conferring DNA virus resistance in plants," Nature Plants, vol. 1, article 15144, Article ID 15144, 2015.

[36] Z. Ali, A. Abulfaraj, A. Idris, S. Ali, M. Tashkandi, and M. M. Mahfouz, "CRISPR/Cas9-mediated viral interference in plants," Genome Biology, vol. 16, no. 1, article 238, 2015.

[37] N. J. Baltes, A. W. Hummel, and E. Konecna, "Conferring resistance to geminiviruses with the CRISPR-Cas prokaryotic immune system," Nature Plants, vol. 1, article 15145, 2015.

[38] D. Liu, X. Chen, J. Liu, J. Ye, and Z. Guo, "The rice ERF transcription factor OsERF922 negatively regulates resistance to Magnaporthe oryzae and salt tolerance," Journal of Experimental Botany, vol. 63, no. 10, pp. 3899-3912, 2012.

[39] F. Wang, C. Wang, P. Liu et al., "Enhanced rice blast resistance by CRISPR/ Cas9-Targeted mutagenesis of the ERF transcription factor gene OsERF922," PLoS ONE, vol. 11, no. 4, Article ID e0154027, 2016.

[40] J. Chandrasekaran, M. Brumin, D. Wolf et al., "Development of broad virus resistance in non-transgenic cucumber using CRISPR/Cas9 technology," Molecular Plant Pathology, vol. 17, no. 7, pp. 1140-1153, 2016.

[41] F. Zhang, Y. Wen, and X. Guo, "CRISPR/Cas9 for genome editing: Progress, implications and challenges," Human Molecular Genetics, vol. 23, no. 1, pp. R40-R46, 2014.

[42] J. F. Petolino and J. P. Davies, "Designed transcriptional regulators for trait development," Plant Science, vol. 201-202, no. 1, pp. 128-136, 2013.

[43] H. Wang, H. Yang, C. S. Shivalila et al., "One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering," Cell, vol. 153, no. 4, pp. 910-918, 2013.

[44] L. Lowder, A. Malzahn, and Y. Qi, "Rapid evolution of manifold CRISPR systems for plant genome editing," Frontiers in Plant Science, vol. 7, no. 2016, article no. 1683, 2016.

[45] D. G. Knorre and V. V Vlasov, "Reactive derivatives of nucleic acids and their components as affinity reagents," Russian Chemical Reviews, vol. 54, no. 9, pp. 836-851, 1985.

[46] N. J. Palpant and D. Dudzinski, "Zinc finger nucleases: Looking toward translation," Gene Therapy, vol. 20, no. 2, pp. 121-127, 2013.

[47] R. Jankele and P. Svoboda, "TAL effectors: Tools for DNA-Targeting," Briefings in Functional Genomics, vol. 13, no. 5, pp. 409-419, 2014.

[48] C. O. Pabo, E. Peisach, and R. A. Grant, "Design and selection of novel Cys2His2 zinc finger proteins," Annual Review of Biochemistry, vol. 70, pp. 313-340, 2001.

[49] T. Cathomen and J. Keith Joung, "Zinc-finger nucleases: the next generation emerges," Molecular Therapy, vol. 16, no. 7, pp. 1200-1207, 2008.

[50] J. F. Petolino, "Genome editing in plants via designed zinc finger nucleases," In Vitro Cellular and Developmental Biology-Plant, vol. 51, no. 1, 2015.

[51] N. P. Pavletich and C. O. Pabo, "Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A," Science, vol. 252, no. 5007, pp. 809-817, 1991.

[52] W. M. Ainley, L. Sastry-Dent, M. E. Welter et al., "Trait stacking via targeted genome editing," Plant Biotechnology Journal, vol. 11, no. 9, pp. 1126-1134, 2013.

[53] J. F. Petolino, A. Worden, K. Curlee et al., "Zinc finger nuclease-mediated transgene deletion," Plant Molecular Biology, vol. 73, no. 6, pp. 617-628, 2010.

[54] S. Schornack, A. Meyer, P. Romer, T. Jordan, and T. Lahaye, "Gene-for-gene-mediated recognition of nuclear-targeted AvrBs3-like bacterial effector proteins," Journal of Plant Physiology, vol. 163, no. 3, pp. 256-272, 2006.

[55] P. Romer, S. Hahn, T. Jordan, T. Straufi, U. Bonas, and T. Lahaye, "Plant pathogen recognition mediated by promoter activation of the pepper Bs3 resistance gene," Science, vol. 318, no. 5850, pp. 645-648, 2007.

[56] J. Boch, H. Scholze, S. Schornack et al., "Breaking the code of DNA binding specificity of TAL-type III effectors," Science, vol. 326, no. 5959, pp. 1509-1512, 2009.

[57] B. M. Lamb, A. C. Mercer, and C. F. Barbas III, "Directed evolution of the TALE N-terminal domain for recognition of all 50 bases," Nucleic Acids Research, vol. 41, no. 21, pp. 9779-9785, 2013.

[58] M. Christian, T. Cermak, E. L. Doyle et al., "Targeting DNA double-strand breaks with TAL effector nucleases," Genetics, vol. 186, no. 2, pp. 757-761, 2010.

[59] L. Cong, R. H. Zhou, Y.-C. Kuo, M. Cunniff, and F. Zhang, "Comprehensive interrogation of natural TALE DNA-binding modules and transcriptional repressor domains," Nature Communications, vol. 3, article 968, 2012.

[60] M. L. Christian, Z. L. Demorest, C. G. Starker et al., "Targeting G with TAL Effectors: A Comparison of Activities of TALENs Constructed with NN and NK Repeat Variable Di-Residues," PLoS ONE, vol. 7, no. 9, Article ID e45383, 2012.

[61] J. Streubel, C. Blucher, A. Landgraf, and J. Boch, "TAL effector RVD specificities and efficiencies," Nature Biotechnology, vol. 30, no. 7, pp. 593-595, 2012.

[62] A. N.-S. Mak, P. Bradley, R. A. Cernadas, A. J. Bogdanove, and B. L. Stoddard, "The crystal structure of TAL effector PthXo1 bound to its DNA target," Science, vol. 335, no. 6069, pp. 716-719, 2012.

[63] J. Xiong, J. Ding, and Y. Li, "Genome-editing technologies and their potential application in horticultural crop breeding," Horticulture Research, vol. 2, article 15019, 2015.

[64] I. Y. Abdurakhmonov, "Genomics Era for Plants and Crop Species--Advances Made and Needed Tasks Ahead," in Plant Genomics, I. Abdurakhmonov, Ed., InTech, Croatia, Balkans, 2016.

[65] CropLife International, "Oligonucleotide-Directed Mutagenesis (ODM)," LJournal, 2017.

[66] R. Barrangou, C. Fremaux, and H. Deveau, "CRISPR provides acquired resistance against viruses in prokaryotes," Science, vol. 315, no. 5819, pp. 1709-1712, 2007.

[67] E. Deltcheva, K. Chylinski, C. M. Sharma et al., "CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III," Nature, vol. 471, no. 7340, pp. 602-607, 2011.

[68] D. H. Haft, J. Selengut, E. F. Mongodin, and K. E. Nelson, "A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/cas subtypes exist in prokaryotic genomes," PLoS Computational Biology, vol. 1, article e60, no. 6, pp. 0474-0483, 2005.

[69] A. F. Gilles and M. Averof, "Functional genetics for all: Engineered nucleases, CRISPR and the gene editing revolution," EvoDevo, vol. 5, no. 1, article no. 43, 2014.

[70] J. A. Doudna and E. Charpentier, "The new frontier of genome engineering with CRISPR-Cas9," Science, vol. 346, no. 6213, 2014.

[71] D. B. Graham and D. E. Root, "Resources for the design of CRISPR gene editing experiments," Genome Biology, vol. 16, no. 1, article no. 260, 2015.

[72] L. C. Perkin, S. L. Adrianos, and B. Oppert, "Gene disruption technologies have the potential to transform stored product insect pest control," Insects, vol. 7, no. 3, article no. 46, 2016.

[73] P. Perez-Pinera, D. G. Ousterout, and C. A. Gersbach, "Advances in targeted genome editing," Current Opinion in Chemical Biology, vol. 16, no. 3-4, pp. 268-277, 2012.

[74] L. Chen, L. Tang, H. Xiang et al., "Advances in genome editing technology and its promising application in evolutionary and ecological studies," GigaScience, vol. 3, no. 1, article no. 24, 2014.

[75] C. Kissoudis, C. van de Wiel, R. G. F. Visser, and G. van der Linden, "Enhancing crop resilience to combined abiotic and biotic stress through the dissection of physiological and molecular crosstalk," Frontiers in Plant Science, vol. 5, no. MAY, article no. 207, 2014.

[76] L. Liu and X.-D. Fan, "CRISPR-Cas system: A powerful tool for genome engineering," Plant Molecular Biology, vol. 85, no. 3, pp. 209-218, 2014.

[77] M. Jain, "Function genomics of abiotic stress tolerance in plants: A CRISPR approach," Frontiers in Plant Science, vol. 6, no. MAY, article no. 375, pp. 1-4, 2015.

[78] G. Andolfo, P. Iovieno, L. Frusciante, and M. R. Ercolano, "Genome-editing technologies for enhancing plant disease resistance," Frontiers in Plant Science, vol. 7, no. 2016, article no. 1813, 2016.

[79] S. Khatodia, K. Bhatotia, N. Passricha, S. M. P. Khurana, and N. Tuteja, "The CRISPR/Cas genome-editing tool: Application in improvement of crops," Frontiers in Plant Science, vol. 7, no. 2016, article no. 506, 2016.

[80] R. C. Nongpiur, S. L. Singla-Pareek, and A. Pareek, "Genomics Approaches for Improving Salinity Stress Tolerance in Crop Plants," Current Genomics, vol. 17, no. 4, pp. 343-357, 2016.

[81] V. Shukla, M. Gupta, F. Urnov, D. Guschin, M. Jan, and P. Bundock, "Targeted modification of malate dehydrogenase, 2013." WO Patent Publication Number: WO 2013166315 A1.

[82] C. A. Hollender and C. Dardick, "Molecular basis of angiosperm tree architecture," New Phytologist, vol. 206, no. 2, pp. 541-556, 2015.

[83] Y. Fang and B. M. Tyler, "Efficient disruption and replacement of an effector gene in the oomycete Phytophthora sojae using CRISPR/Cas9," Molecular Plant Pathology, vol. 17, no. 1, pp. 127-139, 2016.

[84] G. E. Hastings and P. G. Wolf, "The Therapeutic Use of Albumin," Archives of Family Medicine, vol. 1, no. 2, pp. 281-287, 1992.

[85] Y. He, T. Ning, T. Xie et al., "Large-scale production of functional human serum albumin from transgenic rice seeds," Proceedings of the National Academy of Sciences of the United States of America, vol. 108, no. 47, pp. 19078-19083, 2011.

[86] M. Bosch and S. P. Hazen, "Lignocellulosic feedstocks: Research progress and challenges in optimizing biomass quality and yield," Frontiers in Plant Science, vol. 4, article no. 474, 2013.

[87] C. M. Lee, T. J. Cradick, E. J. Fine, and G. Bao, "Nuclease target site selection for maximizing on-target activity and minimizing off-target effects in genome editing," Molecular Therapy, vol. 24, no. 3, pp. 475-487, 2016.

[88] Q. C. Cai, J. Miller, F. Urnov et al., "Optimized non-canonical zinc finger proteins," US Patent Number: 9,187,758. Publication date: Nov 17, 2015.

[89] A. Lombardo, D. Cesana, P. Genovese et al., "Site-specific integration and tailoring of cassette design for sustainable gene transfer," Nature Methods, vol. 8, no. 10, pp. 861-869, 2011.

[90] T. Koo, J. Lee, and J. Kim, "Measuring and reducing off-target activities of programmable nucleases including CRISPR-Cas9," Molecules and Cells, vol. 38, no. 6, pp. 475-481, 2015.

[91] Y. Gao and Y. Zhao, "Specific and heritable gene editing in Arabidopsis," Proceedings of the National Academy of Sciences of the United States of America, vol. 111, no. 12, pp. 4357-4358, 2014.

[92] C. Nagamangala Kanchiswamy, D. J. Sargent, R. Velasco, M. E. Maffei, and M. Malnoy, "Looking forward to genetically edited fruit crops," Trends in Biotechnology, vol. 33, no. 2, pp. 62-64, 2015.

[93] R.-F. Xu, H. Li, R.-Y. Qin et al., "Generation of inheritable and "transgene clean" targeted genome-modified rice in later generations using the CRISPR/Cas9 system," Scientific Reports, vol. 5, Article ID 11491, 2015.

[94] N. Podevin, Y. Devos, H. V. Davies, and K. M. Nielsen, "Transgenic or not? No simple answer! New biotechnology-based plant breeding techniques and the regulatory landscape," EMBO Reports, vol. 13, no. 12, pp. 1057-1061, 2012.

[95] M. Araki and T. Ishii, "Towards social acceptance of plant breeding by genome editing," Trends in Plant Science, vol. 20, no. 3, pp. 145-149, 2015.

[96] J. G. Schaart, C. C. M. van de Wiel, L. A. P. Lotz, and M. J. M. Smulders, "Opportunities for Products of New Plant Breeding Techniques," Trends in Plant Science, vol. 21, no. 5, pp. 438-449, 2016.

[97] J. W. Woo, J. Kim, S. I. Kwon et al., "DNA-free genome editing in plants with preassembled CRISPR-Cas9 ribonucleoproteins," Nature Biotechnology, vol. 33, no. 11, pp. 1162-1164, 2015.

[98] F. Hartung and J. Schiemann, "Precise plant breeding using new genome editing techniques: Opportunities, safety and regulation in the EU," Plant Journal, vol. 78, no. 5, pp. 742-752, 2014.

[99] D. F. Voytas and C. Gao, "Precision genome engineering and agriculture: opportunities and regulatory challenges," PLoS biology, vol. 12, no. 6, p. e1001877, 2014.

[100] H. D. Jones, "Regulatory uncertainty over genome editing," Nature Plants, vol. 1, Article ID 14011, 2015.

[101] M. Lusser, C. Parisi, D. Plan, and E. Rodriguez-Cerezo, "Deployment of new biotechnologies in plant breeding," Nature Biotechnology, vol. 30, no. 3, pp. 231-239, 2012.

[102] K. Belhaj, A. Chaparro-Garcia, S. Kamoun, N. J. Patron, and V Nekrasov, "Editing plant genomes with CRISPR/Cas9," Current Opinion in Biotechnology, vol. 32, pp. 76-84, 2015.

[103] J. D. Wolt, K. Wang, and B. Yang, "The regulatory status of genome-edited crops," Plant Biotechnology Journal, vol. 14, no. 2, pp. 510-518, 2016.

[104] S. Huang, D. Weigel, R. N. Beachy, and J. Li, "A proposed regulatory framework for genome-edited crops," Nature Genetics, vol. 48, no. 2, pp. 109-111, 2016.

Venera S. Kamburova, (1) Elena V. Nikitina, (1) Shukhrat E. Shermatov, (1) Zabardast T. Buriev, (1) Siva P. Kumpatla, (2) Chandrakanth Emani, (3) Ibrokhim Y. Abdurakhmonov (1)

(1) Center of Genomics and Bioinformatics, Academy of Sciences of the Republic of Uzbekistan, University Street-2, Qibray Region, 111215 Tashkent, Uzbekistan


Methods

Biological material

Six accessions of Hieracium subgenus Pilosella were employed in this study: Hieracium piloselloides (D36), Hieracium praealtum (R35), Hieracium pilosella (P36), m115 and m134, which were derived from R35 [16] and D18 derived from D36 by parthenogenesis of a rare meiotically reduced egg [15, 18, 30]. The species have a base chromosome number of n = 9. The mean 2C values of genome size range from 7.03 pg in diploids to 16.67 in tetraploid accessions [67]. Apomictic species of Hieracium are facultative the proportion of ovules forming AI cells and undergoing fertilization-independent seed formation within D36 and R35 is estimated to be

99 %, respectively [15, 20]. Plants were vegetatively micropropagated to maintain clonal integrity. Plant growth conditions and AI phenotyping methods have been described previously [18, 24]. Grafting was performed as previously described [44, 68]. After flowering, ovaries formed in the grafted scion were fixed, cleared and scored at capitulum stages 4 and 10 for the presence or absence of enlarged AI-like cells in the ovule, defining the capability to initiate apomixis. Extent of embryo sac abortion was also scored. Chromosome spreads and fluorescence in situ hybridization (FISH) with an LOA267.14 BAC probe were performed as previously described [19].

D18 genomic DNA sequencing and assembly

FISH experiments confirmed that D18 consists of 18 chromosomes including the LOA-carrying chromosome, together with conserved LOA-linked markers [17]. D18 genomic DNA was extracted from leaves using an adaption of the nuclear DNA enrichment method (Protocol C) described in [69]. Illumina sequencing of the D18 genomic DNA was undertaken by the Australian Genome Research Facility using a combination of 2 × 100 bp short insert (SI) paired-end and 2 × 100 bp standard insert paired-end sequencing using a HiSeq 2000 system.

The SI set generated fragment lengths distributed around 180 bp,therefore allowing 100 bp paired-end sequence reads from either end of the fragment to overlap. Prior to assembly,genomic reads were preprocessed to remove adapter- and/or vector-contaminated sequences, sequences containing N’s, excess exact duplicate read pairs and isolated sequences that did not substantially overlap another sequence in the dataset. SI reads were processed in an additional step to merge overlapping SI reads. Processed reads from DNA sequencing were assembled using the BioKanga assembly algorithm (https://sourceforge.net/projects/biokanga/).

The resulting genomic contigs were annotated for putative genic regions with the AUGUSTUS gene model prediction algorithm, with tomato as the training species, allowing for partial predictions and UTR predictions [70]. Both the genomic resource as a whole and translated predicted gene sequences were further annotated with alignments to known protein sequence sets from Arabidopsis (TAIR10), tomato (ITAG2.4), rice (MSU7), sorghum (Phytozome10 Sbicolor2.1), Physcomitrella patens (Phytozome10 PPatens3.0), Zea mays (Phytozome10 Zmays6a) and lettuce ESTs (National Center for Biotechnology Information, NCBI). Alignments to protein sequences were completed using blastp or tblastn as required, while genomic to EST alignments used a custom Blat-like algorithm, Blitz (https://sourceforge.net/projects/biokanga/). Blastp and tblastn alignments of predicted D18 protein sequences or CDSs to other species’ protein sequences reported up to 10 alignments for each D18 protein that met an e-value threshold of 1e-50 and had ≥50 % of D18 protein length covered by the alignment. For the purposes of the sexual pathway analysis, the top hit was chosen. Blitz alignments of EST sequences to D18 genomic sequences reported up to 10 alignments per EST sequence, with ≥60 % of EST sequence length included in alignment. Newly assembled Hieracium transcriptome contigs were also aligned to the D18 genomic contigs using Blitz with parameters as above. Sequence redundancy across the genomic contig set was assessed using a custom program to calculate edit (Hamming) distance between all possible 100-bp kmers within the assembly.

Transcriptome and small RNA sequencing

To generate the transcriptomic resource, whole ovaries were dissected and collected into liquid nitrogen. Total RNA was isolated from ovary and leaf tissue and fractionated into mRNA (polyA) and small RNA (mirVana miRNA Isolation Kit, Life Technologies). Selection of small RNAs (<35 nt) with polyacrylamide gel electrophoresis was performed during library preparation. Illumina sequencing of the small RNA libraries using 50-bp single-end sequencing and sequencing of the mRNA libraries using 100-bp sequencing was performed by the Australian Genome Research Facility using a HiSeq 2000 system. Two biological replicates of mRNA and one replicate of small RNA were sequenced. Raw reads from both were trimmed for adapter and low quality sequence, and small RNA sequence sets were filtered to retain lengths of 18 to 25 nucleotides. Only sequences observed a minimum of five times (reads) in any one sample were further analysed. RNA sequencing libraries were preprocessed to remove adapter sequences and low quality ends using the sequence trimming algorithm (mcf) and assembled as single-ended reads using the Trinity transcriptome assembly pipeline [71].

M115 and m134 SNP discovery

R35 ovule transcriptome contigs from both MMC and FM stages were used as a reference to realign RNA sequencing reads from the R35, m115 and m134 transcriptomes. Alignments were performed using the BioKanga aligner, allowing for reads to align to multiple loci and to have up to 10 % of the read length as substitutions. SNP discovery was based on two possible scenarios. The first scenario assumes that genes deleted from m134 and m115 contain a homoeologous or paralogous gene copy within the R35 genome. SNPs identified as heterozygous (AB) in R35 and homozygous (AA) in both m115 and m134 were selected for this category. The second scenario assumes hemizygosity of the deleted genes, and SNPs were therefore selected based on a homozygous genotype (A-) in R35 and an absence of reads (−−) aligned from both m115 and m134 mutants. SNP discovery was performed using BioKanga and required a minor allele frequency of at least 0.25 at any SNP locus. Sequences of LOA-linked contigs A and B [19] were used to identify syntenic regions within a genetic map of lettuce [22]. R35 transcriptome contigs identified from the SNP discovery process with similarity to lettuce ESTs within this syntenic region were considered for further analysis. To determine linkage to LOA, SCAR markers were designed and amplified from 9 LOA deletion mutants, 10 LOP deletion mutants and a subset of 42 progeny from a previously described R35 mapping population [20]. Primer sequences and PCR conditions are specified in Additional file 1: Table S8. Validated markers were mapped onto the R35 linkage map using JoinMap 4.0 as previously described [20].

Differential expression and gene ontology analysis

To generate read counts for each predicted Hieracium gene model, RNA sequencing reads were aligned to the D18 genomic contig set using the BioKanga aligner, accepting only uniquely aligning reads with a maximum number of mismatches of 10 % the length of the read. The sum of reads aligned within AUGUSTUS gene model predictions were used as read counts for each predicted gene. All replicates were treated separately, and read count normalization and analysis of differential expression were assessed using the edgeR package within the R statistical software suite [72] (https://cran.r-project.org/). Gene models were considered differentially expressed if they met a statistical threshold of P ≤ 0.01, corrected for multiple testing and a minimum change of twofold.

To test for enrichment of functional themes in lists of differentially expressed gene candidates, we utilized the annotation of Arabidopsis genes and their associated GO annotations. Arabidopsis was used instead of tomato, where there remain far fewer GO annotations than are available in Arabidopsis. We observed that annotations of Hieracium gene models to Arabidopsis and tomato genes generated substantial proportions of one-to-many matches. Where several distinct D18 gene models were found to match a single Arabidopsis gene, we multiplied the counts of the associated GO terms by the number of D18 gene models associated. We considered GO terms enriched at P ≤ log10-5. Enrichment was assessed against a background of all gene models found expressed in the ovary generated from maximum read counts across all genotypes.

Small RNA analysis

Small RNA sequences were analysed through alignment to predicted gene models within D18 with both 5’ and 3’ boundaries extended 500 bp, allowing up to two mismatches and a microindel of 1–2 nt. Small RNA sequences were filtered to lengths of 18–25 nt, and only sequences that were observed more than five times in any one sample were retained for further analysis. Total read counts for each gene model (+/−500 bp) were normalized using the edgeR package within the R statistical software suite [72] (https://cran.r-project.org/). MIRNA gene prediction was completed using methods previously used in rice [73], based on detecting potential miRNA-miRNA* pairs: distinct small RNAs perfectly aligned to genomic sequences within 400 bp of each other, between which a transcribed sequence could potentially form a hairpin secondary structure.

QRT-PCR analysis

Whole ovaries were dissected into liquid nitrogen, and RNA extracted using the RNeasy Mini Kit (Qiagen). RNA was treated with RQ1 DNase (Promega), and the reaction was cleaned up using an RNeasy Mini column (Qiagen). cDNA synthesis was carried out using 1 μg of RNA in a SuperScript III first strand synthesis reaction (Invitrogen). qRT-PCR was carried out on an RG-3000 (Corbett Research) using LightCycler 480 SYBER Green I Master Mix (Roche) with the oligonucleotides listed in Additional file 1: Table S8. The average of two biological and two technical replicates is reported using the ΔΔCt method, with HpUBC21 as a reference gene. Genes were considered differentially expressed if they had >1.5-fold change and a paired ttest P value <0.05. The validation rate of differentially expressed gene models in our transcriptome analysis using qRT-PCR was

45 %. While low, this likely represents the difficulty in the design of allele-specific primers to D18 predicted gene models within the tetraploid plants. Oligos used for validation of the transcriptome analysis were designed from transcriptome contigs showing the best match to predicted D18 gene models. All qRT-PCR amplicons were sequenced to verify correct amplification.

ARGONAUTE cloning and phylogenetic analysis

A hidden Markov model (HMM) was generated using HMMER v. 2.1 (http://hmmer.org/) from the NCBI PLN03202 plant AGO alignment. The HMM was then used to search for assembled contigs which contained AGO-related domains within Hieracium whole ovary transcriptomes from both the R35 and P36 genotypes. Identified AGO transcripts were amplified from cDNA libraries derived from whole ovaries at FM stage in both the R35 and P36 backgrounds. Primers used for amplification are listed in Additional file 1: Table S8. PCR products were cloned into the pCR4-TOPO vector (Invitrogen). Multiple clones were sequenced and aligned to generate a consensus sequence which was used for further phylogenetic analysis. Full-length AGO consensus protein sequences from H. praealtum (R35) were aligned to reference AGO cDNAs from Arabidopsis and S. lycopersicum using Clustal Omega [74]. An unrooted tree was constructed within MEGA6 using the maximum likelihood method based on 1000 bootstrap replicates [75]. The tree with the highest log likelihood is shown. All positions with less than 95 % site coverage were eliminated. The final dataset contained a total of 715 positions.

In situ hybridization

Probe templates for HpAGO1a, HpAGO2b and HpAGO5 were amplified from pCR4-TOPO cDNA clones (described above) using oligos that introduced T7 and SP6 promoters at the 5’ and the 3’ end of the amplicon, respectively (Additional file 1: Table S8). The gel purified amplicons were then used as templates for probe synthesis with the DIG RNA Labeling Kit SP6/T7 (Roche). Hybridization and visualization were performed as previously described [76].

ARGONAUTE protein modelling

Models of the Hieracium HpAGO4cP36 and HpAGO4cR35 proteins were constructed by comparative (homology) modelling based on spatial restraints of the human Argonaute2 protein (accession [PDB:4OLB])[43, 77]. The HpAGO4c proteins and 4OLB protein sequences were aligned using AA-Annotator [78], followed by manual adjustments, and analysed for the dispositions of secondary structural elements [79]. The alignments were used as input parameters to build three-dimensional (3D) models within Modeller 9v8 [77]. The final 3D molecular model of both proteins was selected from 40 models that showed the lowest values of the ‘Modeller Objective Function’ and the most favourable Discrete Optimised Protein Energy (DOPE) scoring parameters [77, 80]. Stereochemical quality and overall G-factors were calculated with PROCHECK [81]. Z-score values for combined energy profiles were evaluated by Prosa2003 [82]. Structural super-positions were performed using the DeepView ‘iterative magic fit’ algorithm [83], where 760 and 751 residues (from totals of 866, 857 and 838 residues in HpAGO4cP36, HpAGO4cR35 and 4OLB, respectively) were aligned in Cα positions with root mean square deviation values of 0.68 Å and 0.72 Å for HpAGO4cP36 and HpAGO4cR35, respectively, excluding indels. Molecular graphics were generated with the PyMOL software package (http://www.pymol.org/).


The Role of DNA Methylation in Abiotic Stress Memory

Somatic Stress Memory

Although abiotic stresses induce various chromatin changes in plants, most epigenetic changes are transient and quickly reset to pre-stressed levels when the abiotic stresses are removed. However, some chromatin changes induced by abiotic stresses can be mitotically heritable and last for several days or even the rest time of plant life in the same generation. In Arabidopsis, recurring dehydration stresses result in transcriptional stress memory which is featured by an increase in the rate of transcription and elevated transcript levels of some stress-response genes (Ding et al., 2012). Cold, drought and heat stress treatments can induce somatic abiotic stress memory with a duration of 3� days, which mainly involve changes in histone modification, including H3K4me2/me3, H3K27me3 and H3K14ac (Lamke and Bäurle, 2017 Bäurle and Trindade, 2020). The memory of vernalization-induced FLC silencing can be maintained in subsequent growth and development under warm temperatures, which is associated with the establishment and maintenance of H3K27me3. In the pro-embryo, the seed-specific transcription factor LEAFY COTYLEDON1 (LEC1) promotes the H3K27me3 demethylation and activation of FLC, thereby erasing the vernalization memory (Tao et al., 2017 He and Li, 2018). It seems that DNA methylation is not responsible for the above stress-induced somatic memory. However, in rice, the major portion of salt-induced DNA methylation or demethylation alterations remain after recovery, suggesting that the salinity-induced DNA methylation changes can remember the environmental salt stress and transmit the stress-induced epigenetic states to daughter cells through mitotic cell divisions in the present generation (Wang et al., 2015). It remains a formal possibility that some genome-loci specific 5mC or 6mA changes may function in somatic memory of plant responses to abiotic stresses.

Transgenerational Inheritance of Stress Memory

Some abiotic stress can induce transgenerational phenotypic changes along with chromatin alterations, which can be detectable until at least one non-stressed generation (Table 1). In Arabidopsis, short-wavelength radiation (ultraviolet-C, UV-C) or flagellin treatment increases the frequency of somatic homologous recombination of a transgenic reporter, which persists in the next four untreated generations (Molinier et al., 2006). It is the first report of transgenerational epigenetic inheritance in plants. Since 2006, deciphering the transgenerational memory of plant stress responses has become a fascinating research area. Some stress responses can be only transmitted to the direct progeny, which is termed as intergenerational stress memory, while some stress responses can be memorized for at least two subsequent stress-free generations, which is known as transgenerational stress memory (Lamke and Bäurle, 2017).

Table 1. Examples of intergenerational and transgenerational stress memory in plants.

The intergenerational stress memory can be triggered by multiple biotic and abiotic stresses, such as flagellin (an elicitor of plant defense), ultraviolet-C, salt, cold, heat and drought stress, β-aminobutyric acid (BABA), methyl jasmonate and the bacteria Pseudomonas syringae pv tomato (PstavrRpt2) (Table 1 Johnsen et al., 2005 Kvaalen and Johnsen, 2008 Sultan et al., 2009 Boyko et al., 2010 Ito et al., 2011 Scoville et al., 2011 Slaughter et al., 2012 Iwasaki and Paszkowski, 2014 Migicovsky et al., 2014 Bilichak et al., 2015 Wibowo et al., 2016 Ganguly et al., 2017 Bose et al., 2020). Interestingly, in perennial Scots pines (Pinus sylvestris L.), environmental memory of naturally dry conditions in the parental trees drive offspring survival and growth under hot-drought conditions (Bose et al., 2020). The stress memory may protect the immediate offspring against recurring stress or offer them the potential for local acclimation to changing environments, while the resetting in the next generation may maximize growth under favorable circumstances (Crisp et al., 2016). The intergenerational stress memory may be mediated by the direct impact of environment factors on the gametogenesis, fertilization and embryo development or maternal cues that are transported into and stored in the seeds when the progeny develops in the mother plants. It remains unclear that how much of the intergenerational stress memory is due to the environment-induced epigenetic changes. The epigenetic regulators involved in the intergenerational stress memory remain largely unidentified, except several reports of the possible roles of small RNAs and DNA methylation (Table 1 Boyko et al., 2010 Ito et al., 2011 Migicovsky et al., 2014 Bilichak et al., 2015 Wibowo et al., 2016). The hyperosmotic stress-induced responses are primarily maintained in the next generation through the female lineage due to widespread DNA glycosylase activity in the male germline, and extensively reset in the absence of stress (Wibowo et al., 2016). How the transient stress memory is maintained during meiosis in the stressed parental plants and removed or reset during the reproduction stage of the next generation remains to be investigated.

Increasing evidences indicate that many abiotic stress responses can exhibit transgenerational epigenetic inheritance (Table 1). Prolonged heat stress can induce transgenerational memory of the release of PTGS and attenuated immunity in Arabidopsis, which is mediated by a coordinated epigenetic network involving histone demethylases, heat shock transcription factors and trans-acting siRNAs (tasiRNAs) (Zhong et al., 2013 Liu et al., 2019). Cold stress and harsh UV-B treatment-induced release of TGS remain limitedly detectable for two non-stressed progeny generations (Lang-Mladek et al., 2010). The UV-C-mediated activation of some transposons can also be maintained for two generations without the presence of stress, which requires the roles of DCL proteins (Migicovsky and Kovalchuk, 2014). Upon exposure to heavy metal stress, the 5mC state of a Tos17 retrotransposon is altered and shows transgenerational inheritance in rice (Cong et al., 2019). Moreover, heavy metal-transporting P-type ATPase genes (HMAs) are up-regulated under heavy metal stress, which was transgenerationally memorized in the unstressed progeny (Cong et al., 2019). Successive generations of drought stress from the tillering to grain-filling stages induces non-random epimutations and over 44.8% of drought-induced epimutations transmit their altered DNA methylation status to unstressed progeny. Epimutation-related genes directly participate in stress-responsive pathways, which may mediate rice plant’s adaptation to drought stress (Zheng et al., 2017). These transgenerational memories may offer the progeny an adaptive advantage or genomic flexibility for better fitness under diverse abiotic stresses.

Stress-induced transgenerational memory has also been reported in some asexual perennial plants. In the genetically identical apomictic dandelion (Taraxacum officinale) plants, various stresses triggered considerable methylation variation throughout the genome, and many modifications were transmitted to unstressed offspring (Verhoeven et al., 2010). In two different apomictic dandelion lineages of the Taraxacum officinale group (Taraxacum alatum and T. hemicyclum) under drought stress or after salicylic acid (SA) treatment, heritable DNA methylation variations are observed across three generations irrespective of the initial stress treatment (Preite et al., 2018). It is needed to note that these stress-induced transgenerational DNA methylation variations in dandelions are genotype and context-specific and not targeted to specific loci (Preite et al., 2018). Unlike most annual plants, the asexual perennial plants use clonal propagation. The stress-induced DNA methylation variations may be largely inherited during mitosis, which may enable the next-generation plants to respond accurately and efficiently to adverse environment factors in some habitats (Latzel et al., 2016). How the methylation variations contribute to the phenotypic variations in asexual perennial plants remains to be investigated.

In the germline and early embryo stage, both the paternal and maternal genomes undergo extensive DNA demethylation via both active and passive demethylation pathways in mammals, which leaves very little possibility for the inheritance of stress-induced changes in methylome (Smith et al., 2012). Some examples of stress-induced transgenerational epigenetic inheritance have been reported in some animals, such as Caenorhabditis elegans, the underlying epigenetic marks are mostly histone modifications or small RNAs (Skvortsova et al., 2018). However, the DNA methylation in plants is not erased but rather epigenetically inherited during plant reproduction (Feng et al., 2010 Calarco et al., 2012 Heard and Martienssen, 2014), suggesting a potential role of DNA methylation in transgenerational memory. In the successive generations of met1-3 mutants deficient in maintaining CG methylation, the loss of mCG is found to progressively trigger new and aberrant genome-wide epigenetic patterns in a stochastic manner, such as RdDM, decreased expression of DNA demethylases and retargeting of H3K9 methylation (Mathieu et al., 2007). Upon potato spindle tuber viroid (PSTVd) infection in tobacco, the body of PSTVd transgene is densely de novo methylated in all three contexts. However, in the viroid-free progeny plants, only m CG can be stably maintained for at least two generations independent of the RdDM triggers (Dalakouras et al., 2012). Thus, CG methylation may function as a central coordinator to secure stable abiotic transgenerational memory. In a population of epigenetic recombinant inbred lines (epiRILs) with epigenetically mosaic chromosomes consisting of wild-type and met1-3, which are nearly isogenic but highly variable at the level of DNA methylation, despite eight generations of inbreeding, unexpectedly high frequencies of non-parental methylation polymorphisms are interspersed in the genome (Reinders et al., 2009). In the F5 individual plants of ddm1 epiRILs, restoration of wild-type methylation is specific to a subset of heavily methylated repeats targeted by RNA interference (RNAi) machinery (Teixeira et al., 2009). Consistent with this, in the NRPD1 complementation Arabidopsis lines, the DNA methylation of a subset of RdDM target loci can also not be restored even at 20 th generations. Many of these non-complemented DMRs overlap with epi-alleles defined in inbreeding experiments or natural accessions, which are functional in plant defense responses (Li et al., 2020). Under salt, drought and increased nutrient conditions in Arabidopsis thaliana, ddm1 epiRILs exhibit phenotypic variations in root allocation, nutrient plasticity, drought and salt stress tolerance (Zhang et al., 2013 Kooke et al., 2015). These reports reinforce the idea that heritable variation in 5mC in epiRILs may allow the generation of epi-allelic variation, which have potential adaptive and evolutionary values. However, while the descendants of drought-stressed Arabidopsis lineages exhibit transgenerational memory of increased seed dormancy, the memory is not associated with causative changes in the DNA methylome (Ganguly et al., 2017).

Above all, although the potential roles of epigenetic regulations in transgenerational memory are undoubtable, the roles of stress-induced DNA methylationvariations in the persistence of transgenerational inheritance remain to be further elucidated. The extent to which locus-specific methylation changes might contribute to the maintenance of stress memory also remains unclear. The de novo methylation of a particular region can be set up by RdDM and DNA methylation maintenance consolidates RdDM over generations in Arabidopsis thaliana, thereby establishing epigenetic memory (Kuhlmann et al., 2014). In ddm1 epiRILs, several DMRs are identified as bona fide epigenetic quantitative trait loci (QTL epi ), accounting for 60�% of the heritability for flowering time and primary root length (Cortijo et al., 2014). Whether the inheritance of DMRs induced by abiotic stress contributes to the transgenerational inheritance requires further investigation. In addition, whether abiotic stresses-induced 6mA changes can be inherited and their roles in stress memory remain elusive.


Abstract

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq.

Citation: Lee B, Moon T, Yoon S, Weissman T (2017) DUDE-Seq: Fast, flexible, and robust denoising for targeted amplicon sequencing. PLoS ONE 12(7): e0181463. https://doi.org/10.1371/journal.pone.0181463

Editor: Junwen Wang, Mayo Clinic Arizona, UNITED STATES

Received: March 20, 2017 Accepted: June 30, 2017 Published: July 27, 2017

Copyright: © 2017 Lee et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the paper, its Supporting Information files, and its supporting website (http://data.snu.ac.kr/pub/dude-seq). All the used datasets are also available on the Sequence Read Archive (SRA) under the accession number SRP000570 (SRS002051–SRS002053) at https://www.ncbi.nlm.nih.gov/sra/SRP000570 and the European Nucleotide Archive (ENA) under the accession number PRJEB6244 (ERS671332–ERS671344) at http://www.ebi.ac.uk/ena/data/view/PRJEB6244.

Funding: This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science, ICT and Future Planning) [2014M3A9E2064434 and 2016M3A7B4911115], in part by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare [HI14C3405030014], in part by the Basic Science Research Program through the National Research Foundation of Korea [NRF-2016R1C1B2012170], in part by the ICT R&D program of MSIP/IITP [2016-0-00563, Research on Adaptive Machine Learning Technology Development for Intelligent Autonomous Digital Companion], and in part by NIH Grant 5U01CA198943-03. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.


References

1. Ottley C. Heredity and varicose veins. Br Med J 19341:528.

2. Wagner FB, Herbut PA. Etiology of primary varicose veins. Am J Surg 194978:876-80.

3. Vanhoutte PM, Corcaud S, de Montrion C. Venous disease: from pathophysiology to quality of life. Angiology 199748:559-67.

4. Meissner MH, Gloviczki P, Bergan J, et al. Primary chronic venous disorders. J Vasc Surg 200746 Suppl S:54S-67S.

5. Hauge M, Gundersen J. Genetics of varicose veins of the lower extremities. Hum Hered 196919:573-80.

6. Matousek V, Prerovský I. A contribution to the problem of the inheritance of primary varicose veins. Hum Hered 197424:225-35.

7. Cornu-Thenard A, Boivin P, Baud JM, De Vincenzi I, Carpentier PH. Importance of the familial factor in varicose disease. Clinical study of 134 families. J Dermatol Surg Oncol 199420:318-26.

8. Guo Q, Guo C. . Genetic analysis of varicose vein of lower extremities. Zhonghua Yi Xue Yi Chuan Xue Za Zhi 199815:221-3. (in Chinese)

9. Serra R, Buffone G, de Franciscis A, et al. A genetic study of chronic venous insufficiency. Ann Vasc Surg 201226:636-42.

10. Pistorius MA. Chronic venous insufficiency: the genetic influence. Angiology 200354 Suppl 1:S5-12.

11. Krysa J, Jones GT, van Rij AM. Evidence for a genetic role in varicose veins and chronic venous insufficiency. Phlebology 201227:329-35.

12. Smetanina MA, Shadrina AS, Zolotukhin IA, Filipenko MI. The genetic base of chronic venous disease: a review of modern concepts. Flebol 201610:199. (in Russian).

13. Shadrina AS, Smetanina MA, Sevost'ianova KS, et al. Polymorphic variants rs13155212 (T/C) and rs7704267 (G/C) in the AGGF1 gene and risk of varicose veins of the lower extremities in the population of ethnic Russians. Bull Exp Biol Med 2016161:698-702.

14. Shadrina AS, Sevost'ianova KS, Shevela AI, et al. Polymorphisms in the MTHFR and MTR genes and the risk of varicose veins in ethnical Russians. Biomarkers 201621:619-24.

15. Shadrina AS, Smetanina MA, Sevost'yanova KS, et al. Polymorphism of matrix metalloproteinases genes MMP1, MMP2, MMP3, and MMP7 and the risk of varicose veins of lower extremities. Bull Exp Biol Med 2017163:650-4.

16. Sokolova EA, Shadrina AS, Sevost'ianova KS, et al. HFE p.C282Y gene variant is associated with varicose veins in Russian population. Clin Exp Med 201616:463-70.

17. Shadrina AS, Smetanina MA, Sokolova EA, et al. Association of polymorphisms near the FOXC2 gene with the risk of varicose veins in ethnic Russians. Phlebology 201631:640-8.

18. Shadrina AS, Smetanina MA, Sokolova EA, et al. Allele rs2010963 C of the VEGFA gene is associated with the decreased risk of primary varicose veins in ethnic Russians. Phlebology 201833:27-35.

19. Shadrina AS, Smetanina MA, Sevost'ianova KS, et al. Functional polymorphism rs1024611 in the MCP1 gene is associated with the risk of varicose veins of lower extremities. J Vasc Surg Venous Lymphat Disord 20175:561-6.

20. Shadrina A, Voronina E, Smetanina M, et al. Polymorphisms in inflammation-related genes and the risk of primary varicose veins in ethnic Russians. Immunol Res 201866:141-50.

21. Bell RK, Durand EY, McLean CY, Eriksson N, Tung JY, et al. A large scale genome wide association study of varicose veins in the 23andMe cohort. In: The 64th Annual Meeting of The American Society of Human Genetics 2014 Oct 18-22 San Diego, USA: ASHG. Paper no. 2082M p. 487.

22. Ellinghaus E, Ellinghaus D, Krusche P, et al. Genome-wide association analysis for chronic venous disease identifies EFEMP1 and KCNH8 as susceptibility loci. Sci Rep 20177:45652.

23. Shevela AI, Gavrilov KA, Plotnikova EY, Sevostianova KS, Filipenko ML, Smetanina MA. Altered expression of the extracellular matrix related genes COL15A1, CHRDL2, EFEMP1, and TIMP1 in Varicose Veins. Eur J Vasc Endovasc Surg 202060:e75-6.

24. Shadrina A, Tsepilov Y, Sokolova E, et al. Genome-wide association study in ethnic Russians suggests an association of the MHC class III genomic region with the risk of primary varicose veins. Gene 2018659:93-9.

25. Shadrina A, Tsepilov Y, Smetanina M, et al. Polymorphisms of genes involved in inflammation and blood vessel development influence the risk of varicose veins. Clin Genet 201894:191-9.

26. Fukaya E, Flores AM, Lindholm D, et al. Clinical and genetic determinants of varicose veins. Circulation 2018138:2869-80.

27. Shadrina AS, Sharapov SZ, Shashkova TI, Tsepilov YA. Varicose veins of lower extremities: Insights from the first large-scale genetic study. PLoS Genet 201915:e1008110.

28. Smetanina MA, Shadrina AS, Zolotukhin IA, Seliverstov EI, Filipenko ML. Differentially expressed genes in varicose veins disease: current state of the problem, analysis of the Published Data. Flebol 201711:190. (in Russian).

29. Smetanina M, Sipin F, Seliverstov E, Zolotukhin I, Filipenko M. Differentially Expressed genes in lower limb varicose vein disease. Flebol 202014:122. (in Russian).

30. Smetanina MA, Kel AE, Sevost'ianova KS, et al. DNA methylation and gene expression profiling reveal MFAP5 as a regulatory driver of extracellular matrix remodeling in varicose vein disease. Epigenomics 201810:1103-19.

31. Smetanina MA, Sipin FA, Sevostyanova KS, Khrapov EA, Zolotukhin IA, Filipenko ML. . Two CpG loci in the regulatory regions of the MFAP5 gene are hypomethylated in varicose veins. In: ABSTRACTS of the International Union of Phlebology Chapter Meeting 2019 Aug 25-27 Krakow, Poland. Phlebological Review 20191:23-4.

32. Smetanina MA, Shevela AI, Gavrilov KA, Filipenko ML. . Modified methylation of the DNA loci related to the genes HRC, DPEP2, and CCN5 in varicose veins. In: BOOK OF ABSTRACTS of the 13th St. Petersburg Venous Forum (Christmas Meetings) 2020 Dec 4-5 St. Petersburg, Russia. ADVANCED PROBLEMS IN PHLEBOLOGY 2020. p. 11-12.

33. Lim JP, Brunet A. Bridging the transgenerational gap with epigenetic memory. Trends Genet 201329:176-86.

34. Legoff L, D'Cruz SC, Tevosian S, Primig M, Smagulova F. Transgenerational Inheritance of Environmentally Induced Epigenetic Alterations during Mammalian Development. Cells 20198:1559.

35. Bradbury J. Human epigenome project--up and running. PLoS Biol 20031:E82.

36. Ashar FN, Zhang Y, Longchamps RJ, et al. Association of mitochondrial DNA copy number with cardiovascular disease. JAMA Cardiol 20172:1247-55.

37. Smetanina MA, Sevost’ianova KS, Shirshova AN, et al. . Quantitative and structural characteristics of mitochondrial DNA in varicose veins. In: SCIENTIFIC PROGRAMME AND BOOK OF ABSTRACTS of the 20th Annual Meeting of the European Venous Forum 2019 June 27-29 Zurich, Switzerland. Edizioni Minerva Medica24.

38. Castellani CA, Longchamps RJ, Sumpter JA, et al. Mitochondrial DNA copy number can influence mortality and cardiovascular disease via methylation of nuclear DNA CpGs. Genome Med 202012:84.

39. Kharkevich DA. . Venotropic (phlebotropic) agents. Eksp Klin Farmakol 200467:69-77. (in Russian) [PMID: 15079914]

40. Felixsson E, Persson IA, Eriksson AC, Persson K. Horse chestnut extract contracts bovine vessels and affects human platelet aggregation through 5-HT(2A) receptors: an in vitro study. Phytother Res 201024:1297-301.

41. Feldo M, Wójciak-Kosior M, Sowa I, et al. Effect of Diosmin Administration in Patients with Chronic Venous Disorders on Selected Factors Affecting Angiogenesis. Molecules 201924:3316.

42. Ivanov V, Roomi MW, Kalinovsky T, Niedzwiecki A, Rath M. Bioflavonoids effectively inhibit smooth muscle cell-mediated contraction of collagen matrix induced by angiotensin II. J Cardiovasc Pharmacol 200546:570-6.

43. Zheng Y, Zhang R, Shi W, et al. Metabolism and pharmacological activities of the natural health-benefiting compound diosmin. Food Funct 202011:8472-92.

44. Coccheri S, Mannello F. Development and use of sulodexide in vascular diseases: implications for treatment. Drug Des Devel Ther 20138:49-65.

45. Zolotukhin IA, Porembskaya OY, Smetanina MA, Sazhin AV, Filipenko ML, Kirienko AI. Varicose veins: on the verge of discovering the cause? Annals RAMS 202075:36-45. (in Russian).

46. Maggioli A. Chronic venous disorders: pharmacological and clinical aspects of micronized purified flavonoid fraction. Phlebolymphology 201623:82-91.

47. Ramelet AA. . Venoactive Drugs. In: Goldman MP, Weiss RA, editors. Sclerotherapy 6th ed. Treatment of Varicose and Telangiectatic Leg Veins. Elsevier 2017. pp. 426-34.

48. Mansilha A, Sousa J. Pathophysiological Mechanisms of Chronic Venous Disease and Implications for Venoactive Drug Therapy. Int J Mol Sci 201819:1669.

49. Paysant J, Sansilvestri-Morel P, Bouskela E, Verbeuren TJ. Different flavonoids present in the micronized purified flavonoid fraction (Daflon 500 mg) contribute to its anti-hyperpermeability effect in the hamster cheek pouch microcirculation. Int Angiol 200827:81-5. [PMID: 18277344].

50. Yanushko VA, Bayeshko AA, Sushkov SA, Nebylitsyn YS, Nazaruk AM. Benefits of MPFF on primary chronic venous disease-related symptoms and quality of life: the DELTA study. Phlebolymphology 201421:146-51.

51. Graças C de Souza M, Cyrino FZ, de Carvalho JJ, Blanc-Guillemaud V, Bouskela E. Protective Effects of Micronized Purified Flavonoid Fraction (MPFF) on a Novel Experimental Model of Chronic Venous Hypertension. Eur J Vasc Endovasc Surg 201855:694-702.

52. Kakkos SK, Nicolaides AN. Efficacy of micronized purified flavonoid fraction (Daflon®) on improving individual symptoms, signs and quality of life in patients with chronic venous disease: a systematic review and meta-analysis of randomized double-blind placebo-controlled trials. Int Angiol 201837:143-54.

53. Rodnyansky DV, Fokin AA. [Diosmin-containing phlebotropic drugs in varicose eczema]. Angiol Sosud Khir 201925:88-92.

54. Kurginyan HM, Raskin VV. Modern view on the therapy of chronic venous insufficiency with micronized purified flavonoid fraction. Cardiovasc Ther Prev 202019:2592.

55. Ponomarev ÉA, Strepetov NN, Sotnikov IE, et al. [Use of Detravenol in treatment of chronic venous insufficiency of lower limbs]. Angiol Sosud Khir 202026:95-102.

56. Raffetto JD, Eberhardt RT, Dean SM, Ligi D, Mannello F. Pharmacologic treatment to improve venous leg ulcer healing. J Vasc Surg Venous Lymphat Disord 20164:371-4.

57. Bush R, Comerota A, Meissner M, Raffetto JD, Hahn SR, Freeman K. Recommendations for the medical management of chronic venous disease: The role of Micronized Purified Flavanoid Fraction (MPFF). Phlebology 201732:3-19.

58. Melin MM, Dean SM. RE: A literature review of pharmacological agents to improve venous leg ulcer healing. Letter to the Editor. Wounds 202032:A10.

59. U.S. Department of Health & Human Services, National Institutes of Health, NCATS. Inxight: Drugs. Vasculera. Available from: https://drugs.ncats.io/drug/Z7R65IFU98. [Last accessed on 12 Mar 2021].

60. Casili G, Lanza M, Campolo M, et al. Therapeutic potential of flavonoids in the treatment of chronic venous insufficiency. Vascul Pharmacol 2021137:106825.

61. U.S. Department of Health & Human Services, National Institutes of Health, NCATS. Inxight: Drugs. Diosmin. Available from: https://drugs.ncats.io/drug/7QM776WJ5N. [Last accessed on 12 Mar 2021].

62. Sirtori CR. Aescin: pharmacology, pharmacokinetics and therapeutic profile. Pharmacol Res 200144:183-93.

63. Stücker M, Debus ES, Hoffmann J, et al. Consensus statement on the symptom-based treatment of chronic venous diseases. J Dtsch Dermatol Ges 201614:575-83.

64. Peralta GR, Arévalo Gardoqui J, Llamas Macías FJ, Navarro Ceja VH, Mendoza Cisneros SA, Martínez Macías CG. Clinical and capillaroscopic evaluation in the treatment of chronic venous insufficiency with Ruscus aculeatus, hesperidin methylchalcone and ascorbic acid in venous insufficiency treatment of ambulatory patients. Int Angiol 200726:378-84.

65. Almeida Cyrino FZG, Balthazar DS, Sicuro FL, Bouskela E. Effects of venotonic drugs on the microcirculation: Comparison between Ruscus extract and micronized diosmine1. Clin Hemorheol Microcirc 201868:371-82.

66. Kiesewetter H, Koscielny J, Kalus U, et al. Efficacy of orally administered extract of red vine leaf AS 195 (folia vitis viniferae) in chronic venous insufficiency (stages I-II). A randomized, double-blind, placebo-controlled trial. Arzneimittelforschung 200050:109-17.

67. Rabe E, Stücker M, Esperester A, Schäfer E, Ottillinger B. Efficacy and tolerability of a red-vine-leaf extract in patients suffering from chronic venous insufficiency--results of a double-blind placebo-controlled study. Eur J Vasc Endovasc Surg 201141:540-7.

68. Elleuch N, Zidi H, Bellamine Z, Hamdane A, Guerchi M, Jellazi N. CVD study investigators. Sulodexide in Patients with Chronic Venous Disease of the Lower Limbs: Clinical Efficacy and Impact on Quality of Life. Adv Ther 201633:1536-49.

69. Chupin AV, Katorkin SE, Katel’nitskiĭ II, et al. Sulodexide in treatment of chronic venous insufficiency. Results of the All-Russian multicenter programme ACVEDUCT. Angiol Sosud Khir 201824:47-55. (in Russian).

70. Carroll BJ, Piazza G, Goldhaber SZ. Sulodexide in venous disease. J Thromb Haemost 201917:31-8.

71. Gohil KJ, Patel JA, Gajjar AK. Pharmacological Review on Centella asiatica: A Potential Herbal Cure-all. Indian J Pharm Sci 201072:546-56.

72. Martinez-Zapata MJ, Vernooij RW, Simancas-Racines D, et al. Phlebotonics for venous insufficiency. Cochrane Database Syst Rev 202011:CD003229.

73. Karetová D, Suchopár J, Bultas J. Diosmin/hesperidin: a cooperating tandem, or is diosmin crucial and hesperidin an inactive ingredient only? Vnitr Lek 202066:97-103.

74. Lust L, Kuk H, Kohlhaas J, Sticht C, Korff T. Inhibition of cyclooxygenase activity by diclofenac attenuates varicose remodeling of mouse veins. Vessel Plus 20215:7.