Accessibility issues or difficulties with this website?
Call 919-962-2073 or email hchsadministration@unc.edu.

A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL.

TitleA powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL.
Publication TypePublication
Year2017
AuthorsSofer T, Heller R, Bogomolov M, Avery CL, Graff M, North KE, Reiner AP, Thornton TA, Rice K, Benjamini Y, Laurie CC, Kerr KF
JournalGenet Epidemiol
Volume41
Issue3
Pagination251-258
Date Published2017 Apr
ISSN1098-2272
KeywordsAlgorithms, Computer Simulation, Follow-Up Studies, Genome, Human, genome-wide association study, Genomics, Hispanic or Latino, Humans, Linkage Disequilibrium, Models, Statistical, Phenotype, Polymorphism, Single Nucleotide
Abstract

In genome-wide association studies (GWAS), "generalization" is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWER ) and FDR (FDR ) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWER or FDR under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values <5×10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values <6.6×10-5 (89 regions), we generalized SNPs from 27 regions.

DOI10.1002/gepi.22029
Alternate JournalGenet Epidemiol
PubMed ID28090672
PubMed Central IDPMC5340573
Grant ListHHSN268201300005C / HL / NHLBI NIH HHS / United States
P01 GM099568 / GM / NIGMS NIH HHS / United States
R01 HL129132 / HL / NHLBI NIH HHS / United States
N01HC65236 / HL / NHLBI NIH HHS / United States
N01HC65235 / HL / NHLBI NIH HHS / United States
N01HC65234 / HL / NHLBI NIH HHS / United States
N01HC65233 / HL / NHLBI NIH HHS / United States
N01HC65237 / HL / NHLBI NIH HHS / United States
MS#: 
0389
Manuscript Lead/Corresponding Author Affiliation: 
HCHS/SOL Genetic Analysis Center - University of Washington, Seattle
ECI: 
Yes
Manuscript Affiliation: 
HCHS/SOL Genetic Analysis Center - University of Washington, Seattle
Manuscript Status: 
Published