Pesquisa | Portal Regional da BVS

Filtering the rejection set while preserving false discovery rate control.

Katsevich, Eugene; Sabatti, Chiara; Bogomolov, Marina.

J Am Stat Assoc ; 118(541): 165-176, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37346227

RESUMO

Scientific hypotheses in a variety of applications have domain-specific structures, such as the tree structure of the International Classification of Diseases (ICD), the directed acyclic graph structure of the Gene Ontology (GO), or the spatial structure in genome-wide association studies. In the context of multiple testing, the resulting relationships among hypotheses can create redundancies among rejections that hinder interpretability. This leads to the practice of filtering rejection sets obtained from multiple testing procedures, which may in turn invalidate their inferential guarantees. We propose Focused BH, a simple, flexible, and principled methodology to adjust for the application of any pre-specified filter. We prove that Focused BH controls the false discovery rate under various conditions, including when the filter satisfies an intuitive monotonicity property and the p-values are positively dependent. We demonstrate in simulations that Focused BH performs well across a variety of settings, and illustrate this method's practical utility via analyses of real datasets based on ICD and GO.

Hypotheses on a tree: new error rates and testing strategies.

Bogomolov, Marina; Peterson, Christine B; Benjamini, Yoav; Sabatti, Chiara.

Biometrika ; 108(3): 575-590, 2021 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-36825068

RESUMO

We introduce a multiple testing procedure that controls global error rates at multiple levels of resolution. Conceptually, we frame this problem as the selection of hypotheses that are organized hierarchically in a tree structure. We describe a fast algorithm and prove that it controls relevant error rates given certain assumptions on the dependence between the p-values. Through simulations, we demonstrate that the proposed procedure provides the desired guarantees under a range of dependency structures and that it has the potential to gain power over alternative methods. Finally, we apply the method to studies on the genetic regulation of gene expression across multiple tissues and on the relation between the gut microbiome and colorectal cancer.

Genetic variation and gene expression across multiple tissues and developmental stages in a nonhuman primate.

Jasinska, Anna J; Zelaya, Ivette; Service, Susan K; Peterson, Christine B; Cantor, Rita M; Choi, Oi-Wa; DeYoung, Joseph; Eskin, Eleazar; Fairbanks, Lynn A; Fears, Scott; Furterer, Allison E; Huang, Yu S; Ramensky, Vasily; Schmitt, Christopher A; Svardal, Hannes; Jorgensen, Matthew J; Kaplan, Jay R; Villar, Diego; Aken, Bronwen L; Flicek, Paul; Nag, Rishi; Wong, Emily S; Blangero, John; Dyer, Thomas D; Bogomolov, Marina; Benjamini, Yoav; Weinstock, George M; Dewar, Ken; Sabatti, Chiara; Wilson, Richard K; Jentsch, J David; Warren, Wesley; Coppola, Giovanni; Woods, Roger P; Freimer, Nelson B.

Nat Genet ; 49(12): 1714-1721, 2017 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-29083405

RESUMO

By analyzing multitissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalog of expression quantitative trait loci (eQTLs) in a nonhuman primate model. This catalog contains more genome-wide significant eQTLs per sample than comparable human resources and identifies sex- and age-related expression patterns. Findings include a master regulatory locus that likely has a role in immune function and a locus regulating hippocampal long noncoding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders.

Assuntos

Chlorocebus aethiops/genética , Perfilação da Expressão Gênica , Variação Genética , Locos de Características Quantitativas/genética , Animais , Encéfalo/crescimento & desenvolvimento , Encéfalo/metabolismo , Chlorocebus aethiops/crescimento & desenvolvimento , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único

A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL.

Sofer, Tamar; Heller, Ruth; Bogomolov, Marina; Avery, Christy L; Graff, Mariaelisa; North, Kari E; Reiner, Alex P; Thornton, Timothy A; Rice, Kenneth; Benjamini, Yoav; Laurie, Cathy C; Kerr, Kathleen F.

Genet Epidemiol ; 41(3): 251-258, 2017 04.

Artigo em Inglês | MEDLINE | ID: mdl-28090672

RESUMO

In genome-wide association studies (GWAS), "generalization" is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg ) and FDR (FDRg ) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values <5×10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values <6.6×10-5 (89 regions), we generalized SNPs from 27 regions.

Assuntos

Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Hispânico ou Latino/genética , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único/genética , Algoritmos , Simulação por Computador , Seguimentos , Genômica , Humanos , Desequilíbrio de Ligação , Fenótipo

Characterization of Expression Quantitative Trait Loci in Pedigrees from Colombia and Costa Rica Ascertained for Bipolar Disorder.

Peterson, Christine B; Service, Susan K; Jasinska, Anna J; Gao, Fuying; Zelaya, Ivette; Teshiba, Terri M; Bearden, Carrie E; Cantor, Rita M; Reus, Victor I; Macaya, Gabriel; López-Jaramillo, Carlos; Bogomolov, Marina; Benjamini, Yoav; Eskin, Eleazar; Coppola, Giovanni; Freimer, Nelson B; Sabatti, Chiara.

PLoS Genet ; 12(5): e1006046, 2016 05.

Artigo em Inglês | MEDLINE | ID: mdl-27176483

RESUMO

The observation that variants regulating gene expression (expression quantitative trait loci, eQTL) are at a high frequency among SNPs associated with complex traits has made the genome-wide characterization of gene expression an important tool in genetic mapping studies of such traits. As part of a study to identify genetic loci contributing to bipolar disorder and other quantitative traits in members of 26 pedigrees from Costa Rica and Colombia, we measured gene expression in lymphoblastoid cell lines derived from 786 pedigree members. The study design enabled us to comprehensively reconstruct the genetic regulatory network in these families, provide estimates of heritability, identify eQTL, evaluate missing heritability for the eQTL, and quantify the number of different alleles contributing to any given locus. In the eQTL analysis, we utilize a recently proposed hierarchical multiple testing strategy which controls error rates regarding the discovery of functional variants. Our results elucidate the heritability and regulation of gene expression in this unique Latin American study population and identify a set of regulatory SNPs which may be relevant in future investigations of complex disease in this population. Since our subjects belong to extended families, we are able to compare traditional kinship-based estimates with those from more recent methods that depend only on genotype information.

Assuntos

Transtorno Bipolar/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas/genética , Alelos , Transtorno Bipolar/patologia , Mapeamento Cromossômico , Colômbia , Costa Rica , Feminino , Expressão Gênica , Redes Reguladoras de Genes , Humanos , Masculino , Polimorfismo de Nucleotídeo Único/genética

Many Phenotypes Without Many False Discoveries: Error Controlling Strategies for Multitrait Association Studies.

Peterson, Christine B; Bogomolov, Marina; Benjamini, Yoav; Sabatti, Chiara.

Genet Epidemiol ; 40(1): 45-56, 2016 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-26626037

RESUMO

The genetic basis of multiple phenotypes such as gene expression, metabolite levels, or imaging features is often investigated by testing a large collection of hypotheses, probing the existence of association between each of the traits and hundreds of thousands of genotyped variants. Appropriate multiplicity adjustment is crucial to guarantee replicability of findings, and the false discovery rate (FDR) is frequently adopted as a measure of global error. In the interest of interpretability, results are often summarized so that reporting focuses on variants discovered to be associated to some phenotypes. We show that applying FDR-controlling procedures on the entire collection of hypotheses fails to control the rate of false discovery of associated variants as well as the expected value of the average proportion of false discovery of phenotypes influenced by such variants. We propose a simple hierarchical testing procedure that allows control of both these error rates and provides a more reliable basis for the identification of variants with functional effects. We demonstrate the utility of this approach through simulation studies comparing various error rates and measures of power for genetic association studies of multiple traits. Finally, we apply the proposed method to identify genetic variants that impact flowering phenotypes in Arabidopsis thaliana, expanding the set of discoveries.

Assuntos

Estudos de Associação Genética , Arabidopsis/genética , Arabidopsis/fisiologia , Simulação por Computador , Flores/fisiologia , Estudo de Associação Genômica Ampla , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único , Reprodutibilidade dos Testes

Deciding whether follow-up studies have replicated findings in a preliminary large-scale omics study.

Heller, Ruth; Bogomolov, Marina; Benjamini, Yoav.

Proc Natl Acad Sci U S A ; 111(46): 16262-7, 2014 Nov 18.

Artigo em Inglês | MEDLINE | ID: mdl-25368172

RESUMO

We propose a formal method to declare that findings from a primary study have been replicated in a follow-up study. Our proposal is appropriate for primary studies that involve large-scale searches for rare true positives (i.e., needles in a haystack). Our proposal assigns an r value to each finding; this is the lowest false discovery rate at which the finding can be called replicated. Examples are given and software is available.

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA