ALFA
The NCBI ALlele Frequency Aggregator (ALFA)
pipeline is developed to compute allele frequency for variants in dbGaP
across approved un-restricted studies and to provide the data as
open-access to the public through dbSNP. The goal of the ALFA project is
to make frequency data from over 1M dbGaP subjects open-access in
future releases to facilitate discoveries and interpretations of common
and rare variants with biological impacts or causing diseases.
dbGaP contains the results of over 1,200 studies that have investigated the interaction of genotype and phenotype. The database has over two million subjects and hundreds of millions of variants along with thousands of phenotypes and molecular assay data. The harmonized ALFA data will allow the wider scientific community to access allele frequency for millions of variants in dbGaP. Only dbGaP studies that have been approved by the submitting institutions for sharing of summary statistics are included in ALFA dataset for open-access. Genotype and associated individual-level data are accessible through dbGaP authorized access.
The R4 release of 408,709 subjects included allele
counts and frequency for 15.5 million rs site including 959,966 ClinVar RS IDs. More information about ALFA,
data access, webinars, and tutorials can be found at https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/ and any questions about ALFA track data should be forwarded to snp-admin@ncbi.nlm.nih.gov.