The 1000Genomes Data

Next: Summary Statistics of the Up: Summary of the Results Previous: Summary of the Results Contents

The 1000Genomes Data

We used HapFABIA to extract short IBD segments from the 1000 Genomes Project genotyping data (2), more specifically, the phase 1 integrated variant call set (version 1) containing phased genotype calls for SNVs, short indels, and large deletions. This data set consists of 1,092 individuals (246 Africans, 181 Admixed Americans, 286 East Asians, and 379 Europeans), 36.6M SNVs, 3.8M short indels, and 14k large deletions. Chromosome 1 contains 3,201,157 SNVs that are on average 78 bp apart and have an average minor allele frequency (MAF) of 0.06. 1,920,833 (60%) SNVs are rare (MAF $\leq$ 0.05), 684,171 (21.4%) are private (minor allele is observed only once), 15,124 (0.47%) have an MAF of zero, and 581,029 (18.2%) are common (MAF 0.05). We kept only the rare SNVs for IBD detection and excluded private ones.

Chromosome 1 was divided into intervals of 10,000 SNVs with adjacent intervals overlapping by 5,000 SNVs. After removing common and private SNVs, we applied HapFABIA to these intervals. We used HapFABIA with 40 iterations and estimated the parameter from the 1000 Genomes Project data.

Next: Summary Statistics of the Up: Summary of the Results Previous: Summary of the Results Contents

Sepp Hochreiter 2013-11-13