Information

Identity By Descent vs Identity By State

Identity By Descent vs Identity By State


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Background

The concepts of Identity By Descent (IBD) vs Identity By State (IBS) are central in population genetics, yet I fail to fully wrap my head around the definitions.

You can find examples where my understanding of IBD vs IBS is quite poor in @DermotHarnett's answer here or in the comments with @PaulStaab here. @PaulStaab suggests that different authors have used different definitions of IBD and IBS.

What is unclear to me

From what I remember from Hartl and Clark (I don't have the book with me to quote), IBD depends on an arbitrary time threshold in the past beyond which if coalescent events occurred, then we still call the two alleles IBS (Identical by state) and not IBD. The idea that the concept of IBD depends on an arbitrary threshold bothers me though!

I suppose that two alleles can be IBD without being IBS in the case where a mutation or recombination event in the middle of the sequence of interest prior (looking backward in time; more recent) to their coalescence. I suppose two alleles can be IBS can not be IBD only if we use an arbitrary threshold that is older than their coalescent time or if convergent/parallel evolution happened.

Questions

  • Does IBD depends on an arbitrary threshold?
  • Are there several definitions of IBD and IBS in use?
  • Does IBD implies IBS?
  • Does IBS implies IBD?
  • Can you please make a short review of these definitions to clear things up?

The issue of IBD and IBS can indeed be confusing.

The identity by state definition refers to the fact that at some point two individuals, even if they are not related to each other, present the same allele at a specific locus. Because of their un-relatedness this similarity probably arose from a similar mutational event.

On the other side, with IBD two individuals happen to share the same allele because of their coancestry.

Alleles that are identical by descent are also identical by state. But the opposite is not true. (see also https://www.biostars.org/p/174048/#174049 and Powell et al 2010, Nature Reviews Genetics 11, 800-805 (1 November 2010) | doi:10.1038/nrg2865).

Now, about the threshold that you were mentioning in your question, I am not entirely sure what you mean. Eventually, all individuals are traceable to a MRCA (most recent common ancestor). So, I think the threshold you talk about refers to the size of the population in analysis. Depending on the size of your population you will have different proportions of alleles that are IBD and IBS.

Hope this help.


This won't be a complete explanation: I'm bothered by these questions myself. But I'll say what I do know.

First:

Yes, Identity By Descent (IBD) is defined relative to a chosen threshold number of generations, at least in the sense that I understand it (in case there is more than one sense - I think there are ones allowing for mutation?). At the chosen number of generations back, we assume that all of those ancestors were unrelated. This seems troublesome! But two things might help it seem less so. First: when we ask if two individuals are related or not, we're really asking whether they're more closely related than the average relatedness of the background population. Second: the pedigree method of estimating kinship and inbreeding coefficients, is just that, a way of estimating, which may fail if its assumptions fail (e.g. if the ancestors at the threshold were not randomly chosen from a randomly mating population).

To illustrate, consider a question about a fictional individual with a fictional family tree (spoilers for Game of Thrones, season 1): How inbred is Joffrey Baratheon?

We know that Joffrey Baratheon is not, in fact, the son of King Robert Baratheon, but rather the product of secret incest between his mother Queen Cersei Lannister and her twin brother Jaime Lannister. If that were all we knew about the situation, we would draw the following pedigree: By using this pedigree, we are implicitly assuming that Cersei and Jaime's parents are unrelated, i.e. randomly chosen from a large randomly mating background population (and assuming that if Cersei and Jaime are more likely than random individuals to share an allele, that this is due solely to IBD from one or the other of those randomly chosen parents, i.e. due to their being close relatives). Based on this assumption, applying the usual method to this pedigree, we estimate Joffrey's coefficient of inbreeding as $0.25$.

However, if we look deeper into the Lannister family tree, we realize that Cersei and Jaime are actually more closely related than a typical brother-sister pair: their parents were not randomly chosen, but were themselves related (though not closely enough to be scandalous for Westerosi society); their father Tywin Lannister, and their mother Joanna Lannister, were cousins. Using this information we draw a more complete pedigree: Using the usual pedigree method again, we are now making a different assumption. We are no longer assuming that Joffrey's grandparents Tywin and Joanna were randomly chosen from the background population. We are instead assuming Joffrey's great-great-grandparents, Gerold Lannister and the Lady Rohanne Webber, were randomly chosen from the population. This assumption seems more reasonable. Since this new pedigree implies more inbreeding in Joffrey's ancestry, we guess that the method should give a higher estimate for his inbreeding coefficient: indeed it does, giving us an estimate of $0.28125$.

Here's how this illustrates the subjectivity of defining IBD. When we used the first pedigree, we were asking: What's the probability that both alleles at a locus in Joffrey's genome, are descended from the same allele in the grandparent generation? When we used the second pedigree, we were asking: What's the probability that both alleles at a locus in Joffrey's genome, are descended from the same allele in the great-great-grandparent generation? These are two different questions, and of course gave two different answers. The trouble, of course, is that the question of how inbred Joffrey is relative to the background population, only has one answer (which the pedigree answers may estimate more or less well)!

Second:

Although important coeffients like the "inbreeding coefficient" or "coefficient of relationship" are often presented in textbooks as being defined in terms of the probability of some pair of alleles being IBD, this cannot be the actual definition. This is because probabilities can't be negative, but relatedness can be (if two individuals are less related than average), as can the inbreeding coefficient (if the animal is outbred, i.e. its parents had negative relatedness). The possibility of negative relatedness even has interesting evolutionary implications. As a sort of flipside of Hamilton's rule for altruistic kin selection - where it can be evolutionarily beneficial for a gene to cause a bearer to harm oneself, in order to help a relative, i.e. an individual more likely than average to also carry a copy of that gene - negative relatedness makes it possible to have spiteful anti-kin selection - where it pays a gene to cause its bearer to harm itself, for no benefit at all, but solely to harm an individual less likely than average to carry a copy of itself (and more likely to carry its competitors)!

Indeed, in his original formulation of these and related coefficients, Sewall Wright did not invoke probabilities or identity by descent at all. For relatedness he talked about the correlation between individuals' allelic states (this requires assigning numbers to alleles, e.g. $A=1$ and $a=0$). Note that correlation coefficients can be negative. The "probability of IBD" interpretation was introduced by Malécot; it's made it into population genetics textbooks mostly just because it's easier to teach (despite the seemingly paradoxical subjectivity of reference number of generations, and the inconsistency with the possibility of negative relatedness).

Wright's explanation of the inbreeding coefficient is more intuitive: he points out that the most important effect of inbreeding is reduction in heterozygosity. Consider a randomly mating population: an individual randomly chosen from it, will have a certain fraction of its loci heterozygous. But an inbred individual will have fewer loci heterozygous (more loci homozygous). Malécot interpreted that excess homozygosity as being "due to coancestry [since a subjectively chosen reference generation]", but we can ignore that conceptual baggage, and just talk about the excess homozygosity (deficiency in heterozygosity) itself. Thus a better definition of the inbreeding coefficient is

$F = frac{H_e - H}{H_e}$

where $H_e$ is the heterozygosity you would expect of the offspring of a random mating given the population's allele frequencies, and $H$ is the inbred individual's actual heterozygosity. A good presentation of this interpretation appears in Hartl's Primer of Population Genetics.

Note that this definition, unlike the probability definition (but like the correlation definition of relatedness) can be negative: outbred individuals have less homozygosity (more heterozygosity) than expected under random mating.


What are Identity-by-state and Identitiy-by descent ?

I've confused for meaning of these. Can someone explain clearly ?

Identity by descent (IBD) is a subset of Identity by state (IBS). If two individuals share some stretch of DNA sequence (i.e. they are identical across this sequence), that is inherently IBS. It might be IBD, if they both inherited that stretch of DNA from a common ancestor, and it basically hasn't undergone shuffling by recombination or even reversionary mutations.

In the wikipedia page example, the portion of orange chromosomes shared between these two individuals (the ones at the bottom) has been inherited from the same source, which we can see through the pedigree.

In this example, under the 'Identity by type' (IBT) figure, see how the allele A1 gains a mutation to become A2, but then gains a second mutation to go back to being A1, and is now IBT to the extant A1 lineage. IBT is sort of analogous to IBS, except you measure it by phenotype rather than at the actual sequence level (the 'state' in IBS). So in this way, the allele is IBT/IBS but not IBD.

Someone else will have to weigh in here, this is more a question I'm asking to expand on the topic: as far as I know itɽ be pretty rare for a reversionary mutation to create an IBS scenario? I think it would more often happen where a haplogroup is split in two, those two sub-groups descend the generations for a while, and then come back together in an individual? In this way, that portion of DNA would be identical (IBS) to the original haplogroup, which maybe never split at all in some other lineage. But they would not be IBD because the haplogroup had been split apart and had separate trajectories through the generations?


Cite this

  • APA
  • Author
  • BIBTEX
  • Harvard
  • Standard
  • RIS
  • Vancouver

In: Genetics Selection Evolution , Vol. 44, No. 1, 28, 2012, p. Article 28.

Research output : Contribution to journal › Article › peer-review

T1 - The importance of identity-by-state information for the accuracy of genomic selection

N1 - Luan, Tu Woolliams, John A Odegard, Jorgen Dolezal, Marlies Roman-Ponce, Sergio I Bagnato, Alessandro Meuwissen, Theo He Genet Sel Evol. 2012 Aug 3144(1):28.

N2 - ABSTRACT: BACKGROUND: It is commonly assumed that prediction of genome-wide breeding values in genomic selection is achieved by capitalizing on linkage disequilibrium between markers and QTL but also on genetic relationships. Here, we investigated the reliability of predicting genome-wide breeding values based on population-wide linkage disequilibrium information, based on identity-by-descent relationships within the known pedigree, and to what extent linkage disequilibrium information improves predictions based on identity-by-descent genomic relationship information. METHODS: The study was performed on milk, fat, and protein yield, using genotype data on 35 706 SNP and deregressed proofs of 1086 Italian Brown Swiss bulls. Genome-wide breeding values were predicted using a genomic identity-by-state relationship matrix and a genomic identityby- descent relationship matrix (averaged over all marker loci). The identity-by-descent matrix was calculated by linkage analysis using one to five generations of pedigree data. RESULTS: We showed that genome-wide breeding values prediction based only on identity-by-descent genomic relationships within the known pedigree was as or more reliable than that based on identity-by-state, which implicitly also accounts for genomic relationships that occurred before the known pedigree. Furthermore, combining the two matrices did not improve the prediction compared to using identity-by-descent alone. Including different numbers of generations in the pedigree showed that most of the information in genome-wide breeding values prediction comes from animals with known common ancestors less than four generations back in the pedigree. CONCLUSIONS: Our results show that, in pedigreed breeding populations, the accuracy of genome-wide breeding values obtained by identity-by-descent relationships was not improved by identityby- state information. Although, in principle, genomic selection based on identity-by-state does not require pedigree data, it does use the available pedigree structure. Our findings may explain why the prediction equations derived for one breed may not predict accurate genomewide breeding values when applied to other breeds, since family structures differ among breeds.

AB - ABSTRACT: BACKGROUND: It is commonly assumed that prediction of genome-wide breeding values in genomic selection is achieved by capitalizing on linkage disequilibrium between markers and QTL but also on genetic relationships. Here, we investigated the reliability of predicting genome-wide breeding values based on population-wide linkage disequilibrium information, based on identity-by-descent relationships within the known pedigree, and to what extent linkage disequilibrium information improves predictions based on identity-by-descent genomic relationship information. METHODS: The study was performed on milk, fat, and protein yield, using genotype data on 35 706 SNP and deregressed proofs of 1086 Italian Brown Swiss bulls. Genome-wide breeding values were predicted using a genomic identity-by-state relationship matrix and a genomic identityby- descent relationship matrix (averaged over all marker loci). The identity-by-descent matrix was calculated by linkage analysis using one to five generations of pedigree data. RESULTS: We showed that genome-wide breeding values prediction based only on identity-by-descent genomic relationships within the known pedigree was as or more reliable than that based on identity-by-state, which implicitly also accounts for genomic relationships that occurred before the known pedigree. Furthermore, combining the two matrices did not improve the prediction compared to using identity-by-descent alone. Including different numbers of generations in the pedigree showed that most of the information in genome-wide breeding values prediction comes from animals with known common ancestors less than four generations back in the pedigree. CONCLUSIONS: Our results show that, in pedigreed breeding populations, the accuracy of genome-wide breeding values obtained by identity-by-descent relationships was not improved by identityby- state information. Although, in principle, genomic selection based on identity-by-state does not require pedigree data, it does use the available pedigree structure. Our findings may explain why the prediction equations derived for one breed may not predict accurate genomewide breeding values when applied to other breeds, since family structures differ among breeds.


Identity By Descent vs Identity By State - Biology

Graphical calculations of the expectation of Identity by Descent

Consider a single locus A in two individuals 1 & 2 . Designate the four alleles at this locus in these two individuals as A 1 & A2 , and A3 & A4 , respectively. [We do not care whether the four alleles are similar or dissimilar by allelic state (DNA sequence)]. Individuals 3 and 4 are full sibs, and inherit one allele from each parent. Individual 5 is their offspring. What is the probability that 5 will inherit two alleles that are identical by descent at this locus?

A 1 passes from 1 to 3 with a probability of 1/2, AND to 4 with a probability of 1/2. Given that it has passed to both offspring, it then passes from 3 to 5 with a probability of 1/2, AND from 4 to 5 with a probability of 1/2. Thus, the joint probability that A 1 has passed from 1 to 5 through both 3 and 4 is (1/2) 4 = 1/16.

The same calculation can be repeated for alleles A2 , A3 & A4 . Then, the combined probability that 5 has two alleles identical by descent at this locus, that is, has two copies of A 1 OR A2 OR A3 OR A4 , is 1/16 + 1/16 + 1/16 + 1/16 = 1/4.

Since this calculation applies for every locus individually, the total fraction of loci that are identical by descent is also 1/4 , which is the Inbreeding Coefficient for full-sibs.


2 THEORY

2.1 Relationship between genomic predictions using T, G or Gx

(1) where Qj is a n × 2 matrix with row i being [2,0], [1,1], or [0,2] depending on the genotype for i being AA, AB or BB at locus j αj is a column vector with the additive effects for alleles A and B at locus j. The first element of row i of matrix Qj/2 gives the probability that a randomly sampled allele from locus j of individual i is an A allele and the second element gives the probability that it is a B allele. So, the matrix of IBS probabilities for locus j can be written as , and the GARM of Nejati-Javaremi et al. ( 1997 ) can be written as follows:

where . Now, if a specific Bayesian model is adopted that assigns a prior for α with mean vector 0 and covariance matrix , the covariance matrix for the breeding values becomes:

where . Suppose for simplicity, each animal has a phenotypic value and μ is the only fixed effect in the model. Then, both the BVM:

(2) have same expected value: 1μ, and covariance matrix: for y, where we have assumed . Thus, under normality, these two models are said to be equivalent and will lead to the same inference on breeding values (Henderson, 1984 ). (3) where M is a n × k matrix with element i of column j equal to 0, 1 or 2 depending on the number of A alleles at locus j for animal i, βj = αjA - αjB, αjA and αjB are the effects at locus j for alleles A and B, , and . To understand how these models are related, note that , where Mj is column j of M. Using this identity for Qj, the model for the vector of breeding values can be written as

where and are row vectors from M corresponding to i and i′, and this difference only involves the substitution effects in β. In fact, comparing the mixed model equations (MME) for the allele effects model 2 to the MME for the substitution effects model 3, it can be shown that the solutions from these MME are related as (Appendix 1):

(4) where b = , and . This BVM and the SEM 3 are equivalent (Strandén & Garrick, 2009 ) and will lead to identical BLUPs for breeding values (Henderson, 1984 ). In most current applications, the columns of M are centred to have mean zero and scaled to have variance of one. Then, a small value is added to the diagonals of G to make it non-singular (Vela-Avitúa, Meuwissen, Luan, & Ødegård, 2015 ). It can be shown that centring the columns of M has no effect on the BLUPs of the substitution effects and thus will also not affect the BLUP of ai(Martini et al., 2017 Strandén & Christensen, 2011 ) . Scaling on the other hand, results in a GARM that unequally weights the IBS matrices across the loci. Following scaling of genotype covariates, the breeding values can be modelled as follows:

where Mj is the centred or uncentred genotype covariates that have not been scaled, pj is the allele frequency at locus j, qj = 1 − pj, , and . Then, assuming a prior distribution for γ with mean vector 0 and covariance matrix , the covariance matrix for b becomes: (5) where is the GARM computed from scaled genotype covariates, and .

Identity-by-descent

These calculations are not LD-aware. It is usually a good idea to perform some form of LD-based pruning before invoking them.

--genome ['gz'] ['rel-check'] ['full'] ['unbounded'] ['nudge']
--ppc-gap <distance in kbs>
--min <minimum PI_HAT value>
--max <maximum PI_HAT value>

--genome invokes an IBS/IBD computation, and then writes a report with the following fields to plink .genome:

FID1Family ID for first sample
IID1Individual ID for first sample
FID2Family ID for second sample
IID2Individual ID for second sample
RTRelationship type inferred from .fam/.ped file
EZIBD sharing expected value, based on just .fam/.ped relationship
Z0P(IBD=0)
Z1P(IBD=1)
Z2P(IBD=2)
PI_HATProportion IBD, i.e. P(IBD=2) + 0.5*P(IBD=1)
PHEPairwise phenotypic code (1, 0, -1 = AA, AU, and UU pairs, respectively)
DSTIBS distance, i.e. (IBS2 + 0.5*IBS1) / (IBS0 + IBS1 + IBS2)
PPCIBS binomial test
RATIOHETHET : IBS0 SNP ratio (expected value 2)

Note that there is one entry per pair of samples, so this file can be very large. The 'gz' modifier causes the output to be gzipped, while 'rel-check' removes pairs of samples with different FIDs, and --min/--max removes lines with PI_HAT values below/above the given cutoff(s).

The 'full' modifier causes the following fields to be added:

IBS0Number of IBS 0 nonmissing variants
IBS1Number of IBS 1 nonmissing variants
IBS2Number of IBS 2 nonmissing variants
HOMHOMNumber of IBS 0 SNP pairs used in PPC test
HETHETNumber of IBS 2 het/het SNP pairs used in PPC test

By default, the minimum distance between informative pairs of SNPs used in the pairwise population concordance (PPC) test is 500 k base pairs you can change this with the --ppc-gap flag.

The underlying P(IBD=0/1/2) estimator sometimes yields numbers outside the range [0,1] by default, these are clipped. The 'unbounded' modifier turns off this clipping. Then, if PI_HAT 2 < P(IBD=2), 'nudge' adjusts the final estimates to P(IBD=0) := (1-p 2 ), P(IBD=1) := 2p(1-p), and P(IBD=2) := p 2 , where p is the current PI_HAT.

This estimator requires fairly accurate minor allele frequencies to work properly. Use --read-freq if you do not think your immediate dataset's empirical MAFs are representative.

--genome jobs can be subdivided with --parallel, which is substantially easier to use than PLINK 1.07 --genome-lists. (Since we are not aware of other practical applications of --genome-lists, that flag has been provisionally retired contact us if you still need it.)

We may add more sophisticated IBD estimation routine(s) in the future if there is sufficient interest.

Runs of homozygosity

--homozyg [] ['consensus-match'] ['extend'] ['subtract-1-from-lengths']
--homozyg-snp <min SNP count>
--homozyg-kb <min length>
--homozyg-density <max inverse density (kb/SNP)>
--homozyg-gap <max internal gap kb length>

--homozyg-window-snp <scanning window size>
--homozyg-window-het <max hets in scanning window hit>
--homozyg-window-missing <max missing calls in scanning window hit>
--homozyg-window-threshold <min scanning window hit rate>

If any of these flags are present, a set of run-of-homozygosity reports is generated using PLINK 1.07's scanning algorithm. See the original documentation for more details.

  • You may also want to try 'bcftools roh', which uses a HMM-based detection method. (We'll include a basic port of that command in PLINK 2.0 if there is sufficient interest.)
  • If you're satisfied with all the default settings described below, just use --homozyg with no modifiers. Otherwise, --homozyg lets you change a few binary settings:
    • The 'group[-verbose]' modifier adds a report on pools of overlapping runs of homozygosity. (This is triggered by --homozyg-match as well.) 'group-verbose' also produces a detailed report for each pool.
    • With 'group[-verbose]', 'consensus-match' causes pairwise segmental matches to be called based only on the SNPs in the entire pool's consensus segment, rather than all the SNPs in the pairwise intersection.
    • Due to how the scanning algorithm works, it is possible for a reported run of homozygosity to be adjacent to a few unincluded homozygous variants. This is generally harmless, but if you wish to extend the ROH to include them, use the 'extend' modifier. (Note that the --homozyg-density bound can prevent extension, and --homozyg-gap affects which variants are considered adjacent.)
    • By default, segment bp lengths are calculated as (<end bp position> - <start bp position> + 1). This is a minor change from PLINK 1.07, which does not add 1 at the end. For testing purposes, you can use the 'subtract-1-from-lengths' modifier to apply the old formula.

    In a "--homozyg group[-verbose]" run, pools of overlapping ROH are formed, then pairwise allelic matches within each pool are identified, then allelic-match groups are formed based on these matches. (More precisely, each group has a reference member marked with an appended '*' in the .hom.overlap 'GRP' column, and all other members of the group have pairwise allelic matches with the reference member.) By default, a pairwise match is defined as 0.95 or greater concordance between segments across jointly homozygous variants you can change this threshold with --homozyg-match.

    --pool-size excludes all pools with fewer than the given number of segments from the report(s).


    Rapid detection of identity-by-descent tracts for mega-scale datasets

    The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, IBD by LocAlity-Sensitive Hashing, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to the current leading method and speeds up analysis by several orders of magnitude on genomic datasets, making IBD estimation tractable for hundreds of thousands to millions of individuals. We applied iLASH to the Population Architecture using Genomics and Epidemiology (PAGE) dataset of ∼52,000 multi-ethnic participants, including several founder populations with elevated IBD sharing, which identified IBD segments on a single machine in an hour (∼3 minutes per chromosome compared to over 6 days per chromosome for a state-of-the-art algorithm). iLASH is able to efficiently estimate IBD tracts in very large-scale datasets, as demonstrated via IBD estimation across the entire UK Biobank (∼500,000 individuals), detecting nearly 13 billion pairwise IBD tracts shared between ∼11% of participants. In summary, iLASH enables fast and accurate detection of IBD, an upstream step in applications of IBD for population genetics and trait mapping.


    Access to Document

    • APA
    • Author
    • BIBTEX
    • Harvard
    • Standard
    • RIS
    • Vancouver

    In: PLoS genetics , Vol. 7, No. 9, e1002287, 09.2011.

    Research output : Contribution to journal › Article › peer-review

    T1 - Inference of relationships in population data using identity-by-descent and identity-by-state

    N2 - It is an assumption of large, population-based datasets that samples are annotated accurately whether they correspond to known relationships or unrelated individuals. These annotations are key for a broad range of genetics applications. While many methods are available to assess relatedness that involve estimates of identity-by-descent (IBD) and/or identity-by-state (IBS) allele-sharing proportions, we developed a novel approach that estimates IBD0, 1, and 2 based on observed IBS within windows. When combined with genome-wide IBS information, it provides an intuitive and practical graphical approach with the capacity to analyze datasets with thousands of samples without prior information about relatedness between individuals or haplotypes. We applied the method to a commonly used Human Variation Panel consisting of 400 nominally unrelated individuals. Surprisingly, we identified identical, parent-child, and full-sibling relationships and reconstructed pedigrees. In two instances non-sibling pairs of individuals in these pedigrees had unexpected IBD2 levels, as well as multiple regions of homozygosity, implying inbreeding. This combined method allowed us to distinguish related individuals from those having atypical heterozygosity rates and determine which individuals were outliers with respect to their designated population. Additionally, it becomes increasingly difficult to identify distant relatedness using genome-wide IBS methods alone. However, our IBD method further identified distant relatedness between individuals within populations, supported by the presence of megabase-scale regions lacking IBS0 across individual chromosomes. We benchmarked our approach against the hidden Markov model of a leading software package (PLINK), showing improved calling of distantly related individuals, and we validated it using a known pedigree from a clinical study. The application of this approach could improve genome-wide association, linkage, heterozygosity, and other population genomics studies that rely on SNP genotype data.

    AB - It is an assumption of large, population-based datasets that samples are annotated accurately whether they correspond to known relationships or unrelated individuals. These annotations are key for a broad range of genetics applications. While many methods are available to assess relatedness that involve estimates of identity-by-descent (IBD) and/or identity-by-state (IBS) allele-sharing proportions, we developed a novel approach that estimates IBD0, 1, and 2 based on observed IBS within windows. When combined with genome-wide IBS information, it provides an intuitive and practical graphical approach with the capacity to analyze datasets with thousands of samples without prior information about relatedness between individuals or haplotypes. We applied the method to a commonly used Human Variation Panel consisting of 400 nominally unrelated individuals. Surprisingly, we identified identical, parent-child, and full-sibling relationships and reconstructed pedigrees. In two instances non-sibling pairs of individuals in these pedigrees had unexpected IBD2 levels, as well as multiple regions of homozygosity, implying inbreeding. This combined method allowed us to distinguish related individuals from those having atypical heterozygosity rates and determine which individuals were outliers with respect to their designated population. Additionally, it becomes increasingly difficult to identify distant relatedness using genome-wide IBS methods alone. However, our IBD method further identified distant relatedness between individuals within populations, supported by the presence of megabase-scale regions lacking IBS0 across individual chromosomes. We benchmarked our approach against the hidden Markov model of a leading software package (PLINK), showing improved calling of distantly related individuals, and we validated it using a known pedigree from a clinical study. The application of this approach could improve genome-wide association, linkage, heterozygosity, and other population genomics studies that rely on SNP genotype data.


    On the Probabilities of Identity States in Permutable Populations

    Génin and Clerget-Darpoux recently discussed the derivation of the probabilities of identity states for populations in which there was some degree of kinship, primarily to allow the extension of the classical affected-sib-pair method to such populations. It is argued here that their derivation makes certain assumptions that are valid only for some very restricted population models and that are not needed for an appropriate treatment. Here the probabilities of the identity states of two individuals with a given genealogical relationship are specified in terms of the kinship parameters of the underlying population, from which the founders of the individuals' genealogy have been randomly selected. It is argued that an appropriate representation for a permutable population, one in which gene identity does not depend on the pattern of genes across individuals, requires three parameters. This representation is related to that of Génin and Clerget-Darpoux and to that of Weir.


    Distance-phenotype analysis

    Case/control

    --groupdist [iteration count] [d]

    --ibs-test and --groupdist consider three subsets of the distance matrix: pairs of affected samples, affected-unaffected pairs, and pairs of unaffected samples. Each of these subsets has a distribution of pairwise genomic distances --ibs-test uses permutation to estimate p-values re: which types of pairs are most similar (see here for details), while --groupdist focuses on the differences between the centers of these distributions and estimates standard errors via delete-d jackknife.

    To perform this type of analysis with scalar phenotype data, you may combine --ibs-test/--groupdist with the --tail-pheno flag. However, the distance-phenotype regression described next should be more informative.

    If --ibs-test is run with no parameters, 100000 permutations are used. If --groupdist is run with less than two parameters, d is set to <number of people> 0.6 rounded down with no parameters, 100000 jackknife iterations are run.

    When combining these commands with --read-dists, units must match: "--distance triangle bin ibs" goes with --ibs-test, while "--distance triangle bin" goes with --groupdist.

    Distance-QT regression

    --regress-distance [iteration count] [d]
    --regress-rel [iteration count] [d]

    These flags perform simple linear regressions and evaluate delete-d jackknife standard error estimates. --regress-distance regresses genomic distances on pairwise average phenotypes and vice versa, while --regress-rel regresses genomic relationships on pairwise average phenotypes and vice versa.

    With less than two parameters, d is set to <number of people> 0.6 rounded down. With no parameters, 100000 jackknife iterations are run.

    A previously calculated triangular binary distance matrix can be loaded as input to --regress-distance using --read-dists. There is currently no similar shortcut for --regress-rel.


    Watch the video: IBD Identical By Descent vs IBS Identical By State (September 2022).


Comments:

  1. Nikazahn

    Thanks for an explanation, I too consider, that the easier, the better...

  2. Caius

    you were visited by simply excellent thought

  3. Cordell

    It is a pity, that I can not participate in discussion now. It is not enough information. But this theme me very much interests.

  4. Bicoir

    the Excellent answer, I congratulate

  5. Nahn

    It's simply incomparable topic

  6. Che

    The logical question

  7. Holbrook

    Excuse, I have thought and have removed the idea



Write a message