HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data
This site contains supporting material to the manuscript "HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data".
Summary
Our method HapFABIA identifies short identity by descent (IBD) segments that are tagged by rare variants in large sequencing data. Two haplotypes are identical by descent (IBD) if they share a segment that both inherited from a common ancestor. Current IBD methods reliably detect long IBD segments because many minor alleles in the segment are concordant between the two haplotypes. However, many cohort studies contain unrelated individuals which share only short IBD segments. Short IBD segments contain too few minor alleles to distinguish IBD from random allele sharing by recurrent mutations. New sequencing techniques improve the situation by providing rare variants which convey more information on IBD than common variants, because random minor allele sharing of rare variants is less likely than for common variants.IBD segment (yellow) that descended from a founder to different individuals. |
Short IBD segments are of interest because (i) they resolve the genetic structure on a fine scale and (ii) they can be assumed to be old. In order to detect short IBD segments, both the information supplied by rare variants and information from more than two individuals should be utilized. These two characteristics are the basis for detecting short IBD segments by HapFABIA. We propose biclustering to detect very short IBD segments that are shared among multiple individuals. Biclustering simultaneously clusters rows and columns of a matrix. In particular it clusters row elements that are similar to each other on a subset of column elements. A genotype matrix has individuals (unphased) or chromosomes (phased) as row elements and SNVs as column elements. Entries in the genotype matrix usually count how often the minor allele of a particular SNV is present in a particular individual. Alternatively, minor allele likelihoods or dosages may be used. Individuals that share an IBD segment are similar to each other at minor alleles of SNVs (tagSNVs) which tag the IBD segment (see Figure below). Therefore an IBD segment that is shared among individuals corresponds to a bicluster because these individuals are similar to one another at this segment. Identifying a bicluster means identifying tagSNVs (column bicluster elements) that tag an IBD segment and, simultaneously, identifying individuals (row bicluster elements) that possess the IBD segment.
Biclustering of a genotyping matrix. Left: original genotyping data matrix with individuals as row elements and SNVs as column elements. Minor alleles are indicated by violet bars and major alleles by yellow bars for each individual-SNV pair. Right: after sorting the rows, the detected bicluster can be seen in the top three individuals. They contain the same IBD segment which is marked in gold. Biclustering simultaneously clusters rows and columns of a matrix so that row elements (here individuals) are similar to each other on a subset of column elements (here the tagSNVs). |
Publication
- Abstract:
Abstract
- Full:
Full Text
- Supplementary Information:
supplemental information
Research Report: IBD between Humans, Neandertals, and Denisovans
Software, Data, Source Codes
- HapFABIA method
Publication: Abstract or Full Text
Software: HapFabia method for extracting very short IBD segments
- Data, results, source code, scripts used in the analysis - used hapFabia
0.90.0:
Data, results, source code, scripts
- Analysis Box for doing your own analysis of IBD segment
sharing between human, Denisovan, and Neandertal:
Analysis box for your analysis: All the data has been prepared for analysis of short IBD sharing between human, Denisovan, and Neandertal. All R-scripts, which are used to generate the results and the plots of the manuscript and the supplementary information, are provided. It is very simple to do your own analysis. GO AHEAD!
Examples of Short IBD Segments in Chromosome 1 of the 1000 Genomes Project
Figures 1-6: Examples of IBD segments that were extracted from chromosome 1 of the 1000 Genomes Project. For these phased genotype data, phasing errors can be seen (yellow lines from the left hand side). Click on any of these thumbnails to view full-size images.
Fig. 3: IBD segment observed in all populations. |
Fig. 4: IBD segment shared by Africans and one admixed American. Again phasing errors for the last two lines (NA20299) and lines 11 and 12 (NA19248). |
Short IBD Segments Found in Data from the Korean Personal Genome Project (KPGP)
The Korean Personal Genome Project (KPGP) is part of the international Personal Genome Project (PGP) established by Genome Research Foundation (GRF). 39 Human genomes were sequenced on an Illumina HiSeq 2000 platform with 30x to 40x coverage. The genotypes of these 38 Koreans and one Caucasian female are combined with the genotype data of the 1000 Genomes Project to extract short IBD segments by HapFABIA.Data/results of hapFabia IBD segment extraction on the KPGP data - used hapFabia 0.90.0:
Data, results, and KPGP IBD segments
The KPGP data contains two twin pairs (KPGP88/KPGP89 and KPGP90/KPGP91) and a family (KPGP1-KPGP12). KPGP10 is a Caucasian female from US. The relations are given in the following pedigree charts:
Pedigree charts for the KPGP individuals. Click on thumbnail to view full-size image. |
Figures K1-K7: Examples of short IBD segments from
chromosome 1 of the KPGP combined with the 1000 Genomes Project. Click on any of these thumbnails to view full-size images.
Fig. K5: IBD segment exclusively shared by Koreans. |
Fig. K6: IBD segment that is shared by both Korean twin pairs. Sequencing errors can be seen as twins should have the same IBD segments. |
Correlation between population proportions and ancient genomes based on short IBD segments
Persons correlation between the Denisova genome and different populations. |
Fisher test for dependencies between the Denisova genome and different populations. |
Persons correlation between the Neandertal genome and different populations. |
Fisher test for dependencies between the Neandertal genome and different populations. |