We recently developed HapFABIA (1) to identify very short segments of identity by descent (IBD) in large sequencing data. HapFABIA identifies 100 times smaller IBD segments than current state-of-the-art methods: 10kbp for HapFABIA vs. 1Mbp for state-of-the-art methods. In experiments with artificial, simulated, and real genotyping data HapFABIA outperformed its competitors in detecting short IBD segments (1). HapFABIA is based on biclustering (4) which in turn used machine learning techniques derived from maximizing the posterior in a Bayes framework (11,5,7,8,10,12,6,9).
HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next generation sequencing (NGS), but can also be applied to DNA microarray data. Especially in NGS data, HapFABIA exploits rare variants for IBD detection. Rare variants convey more information on IBD than common variants, because random minor allele sharing is less likely for rare variants than for common variants (13). In order to detect short IBD segments, both the information supplied by rare variants and the information from IBD segments that are shared by more than two individuals should be utilized (13). HapFABIA uses both. The probability of randomly sharing a segment depends