We are searching data for your request:
Upon completion, a link will appear to access the found materials.
Introns are sections of noncoding DNA that separate exons within a gene locus. However, between different gene loci, I also would assume there to be noncoding regions of DNA. What are these regions called? (And if my assumption is wrong, then please correct me.)
These regions, if unannotated, are simply called intergenic regions.
Sometimes, if a large section of chromatin is regulated by an enhancer/silencer/locus control element, then there are boundary elements that demarcate this chromatin region and prevent the spread of the chromatin state to neighbouring regions.
It would be valid to call them "intergenic regions", but this is just another way of saying that they are the regions between genes. I don't think you can do much more because they are not homogenous in nature. Some may be short and perhaps lack any other function than that described by @WYSIWYG, others may be very long and have enhancers and regulatory elements for proximal or distal genes, simple repeated sequences, pseudogenes, replication origins etc. etc.
From Loci to Biology: Functional Genomics of Genome-Wide Association for Coronary Disease
Genome-wide association studies have provided a rich collection of ≈ 58 coronary artery disease (CAD) loci that suggest the existence of previously unsuspected new biology relevant to atherosclerosis. However, these studies only identify genomic loci associated with CAD, and many questions remain even after a genomic locus is definitively implicated, including the nature of the causal variant(s) and the causal gene(s), as well as the directionality of effect. There are several tools that can be used for investigation of the functional genomics of these loci, and progress has been made on a limited number of novel CAD loci. New biology regarding atherosclerosis and CAD will be learned through the functional genomics of these loci, and the hope is that at least some of these new pathways relevant to CAD pathogenesis will yield new therapeutic targets for the prevention and treatment of CAD.
Keywords: atherosclerosis coronary artery disease functional genome-wide genomics.
© 2016 American Heart Association, Inc.
Figure 1. Mechanism by which non-coding risk…
Figure 1. Mechanism by which non-coding risk SNP can affect phenotype
Top: multiple SNPs associated…
Figure 2. Experimental tools for GWAS functional…
Figure 2. Experimental tools for GWAS functional follow-up studies
GWAS findings can be functionally annotated…
Figure 3. CHD GWAS risk genes are…
Figure 3. CHD GWAS risk genes are active in selective cell types involved in atherosclerosis
Chapter 13 Mastering Biology
One homologous chromosome comes from the father, and the other comes from the mother. Sister chromatids are identical copies of each other.
At the end of meiosis I there are two haploid cells.
At the end of meiosis II there are typically 4 haploid cells.
At the end of telophase I and cytokinesis, there are two haploid cells with chromosomes that consist of two sister chromatids each.
Synapsis, the pairing of homologous chromosomes, occurs during prophase I.
During anaphase I sister chromatids remain attached at their centromeres, and homologous chromosomes move to opposite poles.
Metaphase II is essentially the same as mitotic metaphase except that the cell is haploid.
At the end of telophase II and cytokinesis there are four haploid cells.
Prophase II is essentially the same as mitotic prophase except that the cells are haploid.
In mitosis a cell that has doubled its genetic material divides to produce two diploid daughter cells. In meiosis a cell that has doubled its genetic material undergoes two rounds of division, producing four haploid cells.
The pairing of homologous chromosomes that only occurs during prophase I of meiosis is called synapsis.
Discovery of target genes and pathways of blood trait loci using pooled CRISPR screens and single cell RNA sequencing
The majority of variants associated with complex traits and common diseases identified by genome-wide association studies (GWAS) map to noncoding regions of the genome with unknown regulatory effects in cis and trans. By leveraging biobank-scale GWAS data, massively parallel CRISPR screens and single cell transcriptome sequencing, we discovered target genes of noncoding variants for blood trait loci. The closest gene was often the target gene, but this was not always the case. We also identified trans-effects networks of noncoding variants when cis target genes encoded transcription factors, such as GFI1B and NFE2. We observed that GFI1B trans-target genes were enriched for GFI1B binding sites and fine-mapped GWAS variants, and expressed in human bone marrow progenitor cells, suggesting that GFI1B acts as a master regulator of blood traits. This platform will enable massively parallel assays to catalog the target genes of human noncoding variants in both cis and trans.
Computational Methods for Genetics of Complex Traits
Robert Culverhouse , in Advances in Genetics , 2010
1 Two examples
The first is a theoretical illustration of the fact that the magnitude of an interaction effect in a genetic analysis is not an inherent property of the genetic model, but depends on allele frequencies in the data. Templeton (2000) constructed a two-locus genetic model for total serum cholesterol (TSC) consistent with previously reported data indicating an interaction between ApoE and LDLR ( Pedersen and Berg, 1989 Pedersen and Berg, 1990 ). Analysis of this model based on European allele frequencies (ApoE frequencies of 0.078, 0.77, and 0.152 for ε2, ε3, and ε4, respectively, and LDLR frequencies of 0.22 and 0.78 for A1 and A2, respectively), would suggest that ApoE is the “major gene” for TSC, accounting for 77.7% of the genetic variance (52.8 mg 2 /dl 2 ), while LDLR is a minor player, accounting for only 5.5% of the genetic variance. The remaining 16.8% of the genetic variance could be attributed to an interaction, and would only be detected if the variants were analyzed jointly.
Templeton then evaluated the same genetic model in a hypothetical population with different allele frequencies (ApoE frequencies of 0.02, 0.03, and 0.95 for ε2, ε3, and ε4, respectively, and LDLR frequencies of 0.5 for both A1 and A2). The analysis of this hypothetical population indicated that that ApoE was only a minor contributor to the trait (accounting for 11.9% of the genetic variance) that LDLR was the “major gene” locus (accounting for 81.4% of the variance) and that the interaction is not very strong (accounting for only 6.4% of the variance).
These results suggest that when studying complex diseases, it may be more important to focus on identifying contributors to trait variation than to focus on estimating particular parameters, such as the size of the interaction term.
A second example of how one could miss something important by focusing too much on the significance of an interaction parameter was found in a recent examination of data on smoking. Standard univariate analysis identified an association between nicotine dependence and a nonsynonymous coding SNP, rs16969968, in CHRNA5, a nicotinic receptor sub-unit gene. The RPM also identified this SNP as highly significant (accounting for 1.22% of the phenotypic variation). However, in a pairwise analysis of nicotinic receptor SNPs, the RPM identified a second nearby variant, rs3743075, which was not significant on its own (either by the RPM or a standard univariate analysis), but for which when combined with rs16969968 accounted for 1.83% of the variance. A logistic regression using this pair of predictors, however, indicated that rs3743075 was not significant on its own (p = 0.36), nor was the interaction (p = 0.27).
However, even without an interaction term, the logistic model including both SNPs accounts for 25% more of the trait variation than the sum of the two univariate effects. In fact, each SNP became more significant in the joint analysis than it was in a univariate analysis.
The reason for this surprising result is that the risk alleles for rs16969968 and rs3743075 are negatively correlated. Since rs16969968 has a bigger effect than rs3743075, when rs3743075 is analyzed on its own, its effect is almost completely masked by rs16969968. When rs16969968 is analyzed alone, its effect is dampened a bit by rs3743075, but still shows through.
In this case, requiring either a significant main effect or a significant interaction effect would have resulted in a failure to identify the contribution of rs3743075 to nicotine dependence. As it turns out, though rs3743075 is not associated with nicotine dependence in univariate analyses, it is associated with expression for CHRNA5, the gene for which rs16969968 alters the protein.
What separates gene loci? - Biology
"Expression Quantitative Trait Loci Are Highly Sensitive to Cellular Differentiation State". PLOS Genetics 5 (10): e1000692. doi:10.1371/journal.pgen.1000692. PMC 2757904. PMID 19834560.
^ Michaelson JJ, Loguercio S, Beyer A (July 2009).
Notice that the alleles for the three different loci do not overlap. The lower panel shows the alleles for Bob Blackett's mother Norma for the D3S1358, vWA, and FGA loci. Norma's alleles have been compared by computer to the refrence standards, and labeled.
Loci: See Locus.
Locus (pl. loci): The position on a chromosome of a gene or other chromosome marker also, the DNA at that position. The use of locus is sometimes restricted to mean regions of DNA that are expressed. The specific physical location of a gene on a chromosome. From the Latin for 'place'.
The position on a chromosome where a particular genetic trait resides. Sometimes used to describe multiple genes that affect the same function.
Lipopolysaccharide. A major componant of the outer layer of the outer membrane of Gram-negative bacteria.
(loh-kus) [L., place]
A particular place along the length of a certain chromosome where a given gene is located.
for yield, yield components, heading date, plant height, and physiological and developmental traits under drought have also been established in mapping populations (Maccaferri et al., 2008DZ Habash et al., unpublished data).
QTL analysis studies for agronomical traits .
on the same chromosome
Source: Jenkins, John B. 1990. Human Genetics, 2nd Edition. New York: Harper & Row
are composed of an AT-rich leader sequence followed by multiple, short nucleotide repeats separated by spacer regions. Many of the repeats are palindromic, with predicted RNA hairpin secondary structure, while others lack symmetry and are predicted to form unstructured RNA.
produce a combined effect it is called polygeny.)
Imagine height is controlled just by two genes (though in reality, many genes will contribute to height). Each has two alleles E and e, F and f.
(general) A place, space or locality, especially a centre of an activity.
(mathematics) The set of all points whose coordinates satisfy a given equation or condition.
a particular place along the length of a certain chromosome where a given gene is located.
Covered in BIOL1020 Lab 7 Genetics
a membrane-enclosed bag of hydrolytic enzymes found in the cytoplasm of eukaryotic cells.
7) Genes are located at particular locations on a chromosome called a
. There can be different versions of a gene that exist in a population which are called alleles.
) genotyped for all analysed mouse strains.
No. of SNPs .
are situated on the mitochondrial DNA
3. Correlation genotype-phenotype .
in one species (such as the mouse) have homologs that are also linked in another species (such as humans).
See Synteny in the MGI Glossary.
(Date:3/28/2011). have completed the first human randomized controlled trial . uses a catheter-based probe inserted into the renal . nerves near the kidneys (or in the renal . The researchers say these results confirm that RDN .
Distances are established by linkage analysis, which determines the frequency at which two gene
become separated during chromosomal recombination. (See Mapping.) Genetic marker. A gene or group of genes used to "mark" or track the action of microbes. Genome.
Homozygote -- having identical alleles at one or more
in homologous chromosome segments. Housekeeping genes -- those genes expressed in all cells because they provide functions needed for sustenance of all cell types. HUGO -- Human Genome Organization.
When the combined effects of alleles at different
are equal to the sum of their individual effects. (ORNL)
A nitrogenous base, one member of the base pair AT (adenine-thymine).
See also: base pair (ORNL)
A fixed procedure embodied in a computer program. (NCBI)
Average heterozygosity measures gene variability, the average percent of gene
are homozygous (fixed).
About 14% (1,800 genes) are heterozygous.
In yeast, Sir2 is required for genomic silencing at three
, the telomeres and the ribosomal DNA (rDNA).
are nearer to each other are less likely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be genetically linked. (wikipedia.org) 3. The term refers to the fact that certain genes tend to be inherited together, because they are on the same chromosome.
One case of this phenomenon occurs at
at the major histocompatibility complex (MHC) wherein some human alleles are much more closely related to some chimpanzee alleles than they are to other human alleles (Fig. 7).
Gene interactionThe situation in which genes inherited at different
interact to produce red cell phenotypes, e.g., Le le genes interact with Hh and Se se genes to produce the various Lewis red cell phenotypes.
If we use height as an example, we can say it is controlled by additive genes at four different
. If you get all eight alleles for "tallness" you will be a one extreme of height, if you get eight alleles for "shortness" you will be at the other end of the height spectrum.
X-linked methylation patterns Several
present on the X chromosome become highly methylated when inactive but remain unmethylated on the active X chromosome (Lyon hypothesis).
HE (expected heterozygosity) is also known as gene diversity (= D preferred, less ambiguous term) and is calculated as 1.
For example, MHC Class II are gene
. It is therefore a useful way to discern how closely two individuals are related.
The larger the distance between the
of two genes in a chromosome, the higher the recombination frequency between these genes. This is true because when alleles are closer together within the chromosome, it is more probable that they will be maintained united when chromosomal ends are exchanged by crossing over.
. A map distance of 1 centiMogan (cM) is equivalent to the recombination rate of 1%. For small distance, the recombination frequency (RF) is proportional to the map distance.
linkage /LINK-əj/ The increased tendency of two alleles at different
are closer to each other on a given chromosome (i.e.
Return to Search Page
If you know of any terms that have been omitted from this glossary that you feel would be useful to include, please send details to the Editorial Office at GenScript.
As a result, specific alleles at two different
are found together more or less than expected by chance. The same situation may exist for more than two alleles. Its magnitude is expressed as the delta (D) value and corresponds to the difference between the expected and the observed haplotype frequency.
, in which linkage of a particular phenotype is quantatively and statistically determined.
Randombred. Stocks of mice maintained by systematic scheme of randomization. Randombred mice are genetically heterogeneous, but genetically stable compared to outbred mice.
In genetics, the tendency of two different
are, the greater their linkage and the lower the frequency of recombination between them.
Full glossary .
Of course, the above example is a simplified one based on a single locus with two fitness peaks in the adaptive landscape. In the real world, many, many
Gene family: A set of related genes occupying various
in the DNA, almost certainly formed by duplication of an ancestral gene and having a recognizably similar sequence. Members of a gene family may be functionally very similar or differ widely. The globin gene family is an example. (PBS evolution Glossary) .
Allelic arrangement of two linked heterozygous
, in which each homologous chromosome has one mutant (a or b) and one wild-type (A or B) allele (ie. Ab/aB). Two linked heterozygous gene pairs in the arrangement, Ab/aB.
such as immunoglobulin or T cell receptor (TCR) genes where a functional rearrangement among genes takes place. One of the alleles is either non-functionally or incompletely rearranged and not expressed. This way, each T-cell expresses only one set of TCR genes.
A dihybrid cross is a cross between individuals heterozygous at two different
Mendel's second law is also known as the law of independent assortment. The law of independent assortment states that the alleles of one gene sort into gametes independently of the alleles of another gene.
(genes or other genetic markers) along a chromosome.
The general term for the production of offspring that combine traits of the two parents.
and which are capable of pairing during meiosis.
Homozygote a condition in which the allele of a particular gene are identical.
Humus a black gum-like substance, derived from decayed plant and animal remains.
quantitative trait A genetic trait that is determined by multiple interacting
, and for which there is a range of phenotypes between phenotypic extremes.
quark A subatomic particle found in the nucleus of the atom.
- A measure of the tendency of some genes to be inherited as a group rather than individually because of the proximity of their
in the chromosome
- a phytochemical it is a powerful anti-oxidant that has been shown to neutralize free radicals. Lycopene belongs to the family of carotenoids.
Human beta-globin genes are scattered at five
on human chromosome 11. These genes are expressed sequentially during development, and are similar with same-length introns in similar positions in each gene. Some of the genes are inactivated copies, others are functional only during certain phases of development.
Modifier genes: Genes that affect the level of expression of another gene. Having no place on the
they attach themselves to other genes.
Phenotype: The outward appearance of an individual
Genotype: The genetic makeup of an individual .
An international consortium that seeks to identify genetic
that modulate human body size and shape:
Encyclopedia of Sports Medicine and Science:
Adipose tissue .
A more recent development of quantitative genetics is the analysis of quantitative trait
sequenced locus Sequenced
include predicted and experimentally verified transcribed regions.
nucleolus. A microscopically visible region, or 'compartment', within the cell nucleus. It contains a concentration of proteins and DNA
required for the transcription) of appropriate DNA sequences into ribosomal RNA (rRNA). Like other nuclear compartments, or 'bodies', the nucleolus is not membrane-bound.
A locus is the specific physical location of a gene or other DNA sequence on a chromosome, like a genetic street address. The plural of locus is "
Similarities Between Genetic and Physical Mapping
- Genetic and physical mapping are two types of genome mapping techniques, producing different types of genome maps.
- They use a collection of molecular markers with respective positions on the genome.
- Both allow the identification of genes, which give rise to a particular phenotype or a mutation responsible for a specific variant.
- Also, genome mapping is the initial process of many downstream processes.
- As an example, it helps to identify genetic elements associated with diseases.
Principles of Gene Mapping & Practice Problems
A genetic map is simply a representation of the distribution of a set of loci within the genome. The loci included by an investigator in any one mapping project may bear no relation to each other at all, or they may be related according to any of a number of parameters including functional or structural homologies or a pre-determined chromosomal assignment. Mapping of these loci can be accomplished at many different levels of resolution. At the lowest level, a locus is simply assigned to a particular chromosome without any further localization. At a step above, an assignment may be made to a particular subchromosomal region. At a still higher level of resolution, the relative order and approximate distances that separate individual loci within a linked set can be determined. With ever-increasing levels of resolution, the order and interlocus distances can be determined with greater and greater precision. Finally, the ultimate resolution is attained when loci are mapped onto the DNA sequence itself.
The simplest genetic maps can contain information on as few as two linked loci. At the opposite extreme will be complete physical maps that depict the precise physical location of all of the thousands of genes that exist along an entire chromosome. The first step toward the generation of these complete physical maps has recently been achieved with the establishment of single contigs of overlapping clones across the length of two complete human chromosome arms.
There is actually not one, but three distinct types of genetic maps that can be derived for each chromosome in the genome (other than the Y). The three types of maps — linkage, chromosomal, and physical — distinguished both by the methods used for their derivation and the metric used for measuring distances within them.
The linkage map, also referred to as a recombination map, was the first to be developed soon after the re-discovery of Mendel’s work at the beginning of the 20th century. Linkage maps can only be constructed for loci that occur in two or more heritable forms, or alleles. Thus, monomorphic loci — those with only a single allele — cannot be mapped in this fashion. Linkage maps are generated by counting the number of offspring that receive either parental or recombinant allele combinations from a parent that carries two different alleles at two or more loci. Analyses of this type of data allow one to determine whether loci are “linked” to each other and, if they are, their relative order and the relative distances that separate them.
A chromosomal assignment is accomplished whenever a new locus is found to be in linkage with a previously assigned locus. Distances are measured in centimorgans, with one centimorgan equivalent to a crossover rate of 1%. The linkage map is the only type based on classical breeding analysis. The term “genetic map” is sometimes used as a false synonym for “linkage map” a genetic map is actually more broadly defined to include both chromosomal and physical maps as well.
The chromosome map (or cytogenetic map) is based on the karyotype of the mouse genome. All mouse chromosomes are defined at the cytogenetic level according to their size and banding pattern, and ultimately, all chromosomal assignments are made by direct cytogenetic analysis or by linkage to a locus that has previously been mapped in this way. Chromosomal map positions are indicated with the use of band names. Inherent in this naming scheme is a means for ordering loci along the chromosome.
Today, several different approaches, with different levels of resolution, can be used to generate chromosome maps. First, in some cases, indirect mapping can be accomplished with the use of one or more somatic cell hybrid lines that contain only portions of the mouse karyotype within the milieu of another species’ genome. By correlating the presence or expression of a particular mouse gene with the presence of a mouse chromosome or subchromosomal region in these cells, one can obtain a chromosomal, or subchromosomal, assignment.
The second approach can be used in those special cases where karyotypic abnormalities appear in conjunction with particular mutant phenotypes. When the chromosomal lesion and the phenotype assort together, from one generation to the next, it is likely that the former causes the latter. When the lesion is a deletion, translocation, inversion, or duplication, one can assign the mutant locus to the chromosomal band that has been disrupted.
Finally, with the availability of a locus-specific DNA probe, it becomes possible to use the method of in situ hybridization to directly visualize the location of the corresponding sequence within a particular chromosomal band. This approach is not dependent on correlations or assumptions of any kind and, as such, it is the most direct mapping approach that exists. However, it is technically demanding and its resolution is not nearly as high as that obtained with linkage or physical approaches.
The third type of map is a physical map. All physical maps are based on the direct analysis of DNA. Physical distances between and within loci are measured in basepairs (bp), kilobasepairs (kb) or megabasepairs (mb). Physical maps are arbitrarily divided into short range and long range. Short range mapping is commonly pursued over distances ranging up to 30 kb. In very approximate terms, this is the average size of a gene and it is also the average size of cloned inserts obtained from cosmid-based genomic libraries. Cloned regions of this size can be easily mapped to high resolution with restriction enzymes and, with advances in sequencing technology, it is becoming more common to sequence interesting regions of this length in their entirety.
Direct long-range physical mapping can be accomplished over megabase-sized regions with the use of rare-cutting restriction enzymes together with various methods of gel electrophoresis referred to generically as pulsed field gel electrophoresis or PFGE, which allow the separation and sizing of DNA fragments of 6 mb or more in length. PFGE mapping studies can be performed directly on genomic DNA followed by Southern blot analysis with probes for particular loci. It becomes possible to demonstrate physical linkage whenever probes for two loci detect the same set of large restriction fragments upon sequential hybridizations to the same blot.
Long-range mapping can also be performed with clones obtained from large insert genomic libraries such as those based on the yeast artificial chromosome (YAC) cloning vectors, since regions within these clones can be readily isolated for further analysis . In the future, long-range physical maps consisting of overlapping clones will cover each chromosome in the mouse genome. Short-range restriction maps of high resolution will be merged together along each chromosomal length, and ultimately, perhaps, the highest level of mapping resolution will be achieved with whole chromosome DNA sequences.
Connections between maps
In theory, linkage, chromosomal, and physical maps should all provide the same information on chromosomal assignment and the order of loci. However, the relative distances that are measured within each map can be quite different. Only the physical map can provide an accurate description of the actual length of DNA that separates loci from each other. This is not to say that the other two types of maps are inaccurate. Rather, each represents a version of the physical map that has been modulated according to a different parameter. Cytogenetic distances are modulated by the relative packing of the DNA molecule into different chromosomal regions. Linkage distances are modulated by the variable propensity of different DNA regions to take part in recombination events.
In practice, genetic maps of the mouse are often an amalgamation of chromosomal, linkage, and physical maps, but at the time of this writing, it is still the case that classical recombination studies provide the great bulk of data incorporated into such integrated maps. Thus, the primary metric used to chart interlocus distances has been the centimorgan. However, it seems reasonable to predict that, within the next five years, the megabase will overtake the centimorgan as the unit for measurement along the chromosome.
We focus on the basic two-population isolation-with-migration model with six demographic parameters: three effective population sizes for the three populations (for populations 1, 2, and ancestral), two migration rates (one for each direction), and a time at which the ancestral population separated into the two descendant populations. We distinguish the splitting time, t, from those parameters that provide for the rates of specific types of events in the coalescent process (i.e., migration and population size parameters that we refer to collectively as Φ). Parameters are scaled by the mean mutation rate across loci μ, and hence the effective sizes are given by 4Niμ, the migration rates by mi→j = Mi→j/μ, and the time of split by t = Tμ, where Ni is the effective size of the ith population, Mi→j is the migration rate per generation between population i and j, and T is the time of split (in generations) (Hey and Nielsen 2004). In the case of multiple loci, each locus l will also have a mutation rate scalar ul and an inheritance scalar hl, hence modeling explicitly variation in the mutation rates and modes of inheritance across loci (Hey and Nielsen 2004). No recombination within loci and free recombination among loci are assumed.
Theoretical studies have shown that the effects of selection on linked neutral sites are well approximated by a purely neutral process with a reduction in the migration rate, proportional to the barrier to gene flow caused by selection (Petry 1983 Barton and Bengtsson 1986 Charlesworth et al. 1997 Navarro and Barton 2003a Fusco and Uyenoyama 2011). Similarly, neutral loci linked to regions of the genome under directional or background selection suffer reductions in their effective sizes proportional to the selective strength (Charlesworth et al. 1993 Galtier et al. 2000 Charlesworth 2009 Gossmann et al. 2011). Different modes of selection can thus be modeled by altered demographic parameters. For instance, selection against gene flow, resulting from either local adaptation or genetic incompatibilities in the hybrids, would be reflected as a reduction in the migration rates. Adaptive introgression, on the other hand, would lead to increased migration rates. Likewise, genomic regions undergoing repeated selective sweeps would be seen as having a reduced effective size. Therefore, we assume that the effects of selection on linked sites can be described in terms of altered migration rates and/or effective population sizes. We consider a model where loci are classified into groups with each group having its own set of migration rate and/or effective population size parameters, thus relaxing the assumption that all loci share the same demography. In this general framework the only one of the six demographic parameters that remains shared by all loci is t.
In principle the number of groups of loci could be treated as an unknown however, we focus on the case where the maximum number of groups K is set by the investigator and specifically on the simplest case where loci can be classified into two groups (K = 2) representing (1) loci with histories affected by linkage to genes under selection and (2) loci not affected by selection. It is important to appreciate that the identification of a group as having loci affected by selection depends entirely upon how the investigator interprets the parameter estimates of the different groups of loci. Here we focus on the case of selection against gene flow, and hence the group of loci with reduced migration rate estimates corresponds to loci potentially linked to sites under selection. The assignment of loci to groups is represented by an assignment vector a, where al is the group to which locus l belongs, l = (1, … , L). For instance, in a case with four loci and two groups, a = (1, 1, 2, 2) indicates that loci 1 and 2 belong to group 1 and loci 2 and 3 belong to group 2. With more than one group of loci the set of migration and effective population size parameters, Φ, will include additional terms. In the above example, instead of one set of effective sizes (three parameters) and one pair of migration rates (two parameters), the model includes one set for each group, that is, two sets of effective sizes (six parameters) and two sets of migration rates (four parameters).
Given genetic data from L independent loci, sampled from each of two closely related populations or species, the goal is to obtain an estimate of the vector of locus assignments, , as well as the demographic parameters of the IM model, and . To connect the data to these unknowns we consider for locus l a genealogy, Gl, and for all loci the set of genealogies, G = (Gl, … , GL), that describe the historical coancestry of the sampled sequences, including the tree topologies, as well as the times of coalescent and migration events (Hey and Nielsen 2004). As conceptualized by Felsenstein (1988) and now common practice in population genetics inference (e.g., Kuhner et al. 1998 Beaumont 1999 Beerli and Felsenstein 1999 Hey and Nielsen 2004 Kuhner 2006), we consider the range of possible genealogies by approximating an integration over the genealogical space. Following the approach developed by Hey and Nielsen (2007) this integration provides for the posterior probability of the parameters of interest, (1) where π(Φ|G, t, a) is the conditional probability of the parameters given the genealogies, the splitting time, and the assignment, and π(G, t, a|X) is the probability of genealogies, splitting time, and assignment given the data. Although this integral is not analytically tractable except for the very small sample sizes, as noted by Hey and Nielsen (2007), Equation 1 suggests a two-step Monte Carlo integration approximation. This works by first sampling genealogies, times of split, and assignment vectors from π(G, t, a|X), which are then used to approximate the posterior of the demographic parameters π(Φ|X) in a second step (Hey and Nielsen 2007). Although this approach does not provide an estimate of the joint posterior π(Φ, t, a|X), it does provide estimates of the marginal posterior for a and t (first step), as well as the marginal posterior for Φ, which includes all of the rates parameters for genetic drift and gene flow (second step).
In the first step, a Markov chain Monte Carlo (MCMC) simulation is used to collect samples of <G, t, a> from the posterior π(G, t, a|X) ∝ f(X|G)π(G|t, a)π(t)π(a), where f(X|G) is the likelihood of the data given the genealogies, π(G|t, a) is the prior probability of the genealogies conditional on the times of split and assignment, π(t) is the prior of the times of split, and π(a) is the prior of the assignment vector. The likelihood f(X|G) is computed using conventional methods, such as by mapping mutations onto G in the case of the infinite-sites mutation model or by parameterizing the mutation process under a finite-sites model and using the pruning algorithm (Felsenstein 1981a). The prior probability π(G|t, a) is obtained by integrating over Φ (Hey and Nielsen 2007), (2) where π(Φ) is the prior distribution for the migration rates and effective population sizes, and π(G|Φ, t, a) is the probability of the genealogies conditional on the parameters and assignment. The calculation of this last term, π(G|Φ, t, a), is based on coalescent theory (Hey and Nielsen 2007 Hey 2010 Sousa et al. 2011) and is actually a fairly tractable function of quantities determined from G, including for each rate component of Φ (1) a count of the number of events across G that the rate pertains to and (2) a sum of the total rate for that parameter across G (see, e.g., the appendix to Hey 2010). So too is the solution to the integration in Equation 2 analytical and straightforward. The sample of <G, t, a> values can be used directly to estimate the marginal posterior distributions for t and a. Thus, this first step approximates the marginal posterior π(t, a|X), providing estimates for the times of split and assignment of loci into groups.
The second step consists of using the sample of <G, t, a> values to estimate the marginal posterior for Φ. Applying Bayes’ theorem, the conditional probability of the parameters given the genealogies can be simplified to π(Φ|G, t, a) = π(G|Φ, t, a)π(Φ)/π(G|t, a). Given a sample of n genealogies, times of split and assignment from the posterior, (G ( i ) , t ( i ) , a ( i ) ) ∼ π(G, t, a|X) (i = 1, …, n), we estimate the marginal posterior distribution of the drift and migration parameters as (3) Note that the marginal posterior π(Φ|X) is not conditioned on particular values for t or a, but is in effect estimated by integrating over these other parameters. In sum, given that the joint posterior π(Φ, t, a|X) can be expressed by Equation 1, we can apply the above two-step procedure to obtain the marginal posterior distributions π(t, a|X) (first step) and π(Φ|X) (second step) and hence estimate all the parameters of interest, including t, a, and Φ.
In our exhaustive search for studies that have compared estimates of QST and FST indices between different populations, we were able to find 18 studies of 20 species, including four unpublished ones, which reported QST estimates (Table 2 see also: 73 83 ). Most of these studies have been conducted in plants (55%), whereas invertebrate (30%) and vertebrate (15%) studies in particular, were scarce (Table 2). QST estimates in a given study have been based, on average, on eight different traits (min–max=1–24), growth related morphological and (juvenile) life history traits dominating (Table 2). FST estimates in most of studies have been based on allozymes, but a few microsatellite, RAPD, as well as one nuclear RFLP and one ribosomal DNA based studies have been conducted (Table 2). One study also reported an estimate of FST based RFLP analysis of mtDNA, which was in agreement (after taking account that the Ne for mtDNA genome is 1/4 of that for nuclear markers see, e.g. 18 ) with values returned from analyses of allozymes and microsatellites ( 55 ). Those few studies which have used more than one marker system to estimate FST return qualitatively ( 34 40 ), or even quantitatively ( 55 see also: 34 Table 2), similar conclusions based on estimates from different marker types. Hence, in accordance with data from a number of studies ( 1 ), this suggests that the choice of markers to estimate FST-values may not be of great concern (but see: 62 29 ).
Why Are They Important in Conservation Genetics?
Microsatellite markers are inherited from both parents, making them useful for parentage analysis (think paternity testing) and population genetic studies. Microsatellite markers are useful for population genetic studies because many are considered highly polymorphic. If a microsatellite locus is polymorphic, it means that there is more than one potential allele at a single locus (a specific marker site). Polymorphic loci can have more than 10, even more than 20 potential alleles in that given population. If populations are truly separate from each other, then these alleles are likely to be present in different frequencies in each population. These different allele frequencies increase the potential to observe genetic differences between populations if they exist.
For example, let’s assume we have two populations that are reproductively isolated, but the microsatellite marker we are using only has one or two alleles present for that locus. In addition, the alleles occur at similar frequencies in both populations. The microsatellite data would suggest that these two populations are either one continuous population, or at least had high levels of gene flow between the populations. In this case, the lack of allelic diversity would limit our ability to detect reproductive isolation. Now, let’s assume we are using a microsatellite marker that has 20 possible alleles (highly polymorphic). This large number of alleles increases the odds that allele frequencies will differ between the populations if reproductive isolation is occurring, thus increasing the likelihood of properly identifying the two populations as separate (reproductively isolated from each other). In addition, if the data from the highly polymorphic locus still leads to the conclusion that the two populations are not reproductively isolated, then there would be stronger genetic support suggesting the two populations were either one population or had large amounts of gene flow. In reality, data from multiple microsatellite markers, not just one locus, are used to characterize populations.
Similarly, in parentage analysis, highly polymorphic loci increase our ability to identify individuals. In this case, the large number of alleles increases the likelihood that each individual will have a unique genotype (considering their genotype across multiple loci) relative to other individuals in the population. Similar to population studies, multiple loci are typically used in parentage analysis.
What can cause these differences in allele frequency between populations? Gene flow between populations can act to make allele frequencies more similar, even at low rates of exchange. Genetic drift, a random process, can cause allele frequencies to fluctuate within a population from one generation to the next. New alleles also can arise as a result of a mutation, resulting in the alteration in the number of repeating segments (increasing or decreasing the number of repeats). For example, the ATAGATAGATAGATAGATAGATAGATAGATAG in the figure above could be shortened to ATAGATAGATAG.