Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 87
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 119(30): e2122788119, 2022 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-35867822

RESUMO

Compositional analysis is based on the premise that a relatively small proportion of taxa are differentially abundant, while the ratios of the relative abundances of the remaining taxa remain unchanged. Most existing methods use log-transformed data, but log-transformation of data with pervasive zero counts is problematic, and these methods cannot always control the false discovery rate (FDR). Further, high-throughput microbiome data such as 16S amplicon or metagenomic sequencing are subject to experimental biases that are introduced in every step of the experimental workflow. McLaren et al. [eLife 8, e46923 (2019)] have recently proposed a model for how these biases affect relative abundance data. Motivated by this model, we show that the odds ratios in a logistic regression comparing counts in two taxa are invariant to experimental biases. With this motivation, we propose logistic compositional analysis (LOCOM), a robust logistic regression approach to compositional analysis, that does not require pseudocounts. Inference is based on permutation to account for overdispersion and small sample sizes. Traits can be either binary or continuous, and adjustment for confounders is supported. Our simulations indicate that LOCOM always preserved FDR and had much improved sensitivity over existing methods. In contrast, analysis of composition of microbiomes (ANCOM) and ANCOM with bias correction (ANCOM-BC)/ANOVA-Like Differential Expression tool (ALDEx2) had inflated FDR when the effect sizes were small and large, respectively. Only LOCOM was robust to experimental biases in every situation. The flexibility of our method for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. Our R package LOCOM is publicly available.


Assuntos
Microbiota , Modelos Logísticos , Metagenômica/métodos , Microbiota/genética , Análise de Sequência
2.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37930883

RESUMO

SUMMARY: There are compelling reasons to test compositional hypotheses about microbiome data. We present here linear decomposition model-centered log ratio (LDM-clr), an extension of our LDM approach to allow fitting linear models to centered-log-ratio-transformed taxa count data. As LDM-clr is implemented within the existing LDM program, this extension enjoys all the features supported by LDM, including a compositional analysis of differential abundance at both the taxon and community levels, while allowing for a wide range of covariates and study designs for either association or mediation analysis. AVAILABILITY AND IMPLEMENTATION: LDM-clr has been added to the R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM.


Assuntos
Microbiota , Modelos Lineares , Projetos de Pesquisa
3.
Stat Med ; 43(2): 279-295, 2024 01 30.
Artigo em Inglês | MEDLINE | ID: mdl-38124426

RESUMO

The use of Monte-Carlo (MC) p $$ p $$ -values when testing the significance of a large number of hypotheses is now commonplace. In large-scale hypothesis testing, we will typically encounter at least some p $$ p $$ -values near the threshold of significance, which require a larger number of MC replicates than p $$ p $$ -values that are far from the threshold. As a result, some incorrect conclusions can be reached due to MC error alone; for hypotheses near the threshold, even a very large number (eg, 1 0 6 $$ 1{0}^6 $$ ) of MC replicates may not be enough to guarantee conclusions reached using MC p $$ p $$ -values. Gandy and Hahn (GH)6-8 have developed the only method that directly addresses this problem. They defined a Monte-Carlo error rate (MCER) to be the probability that any decisions on accepting or rejecting a hypothesis based on MC p $$ p $$ -values are different from decisions based on ideal p $$ p $$ -values; their method then makes decisions by controlling the MCER. Unfortunately, the GH method is frequently very conservative, often making no rejections at all and leaving a large number of hypotheses "undecided". In this article, we propose MERIT, a method for large-scale MC hypothesis testing that also controls the MCER but is more statistically efficient than the GH method. Through extensive simulation studies, we demonstrate that MERIT controls the MCER while making more decisions that agree with the ideal p $$ p $$ -values than GH does. We also illustrate our method by an analysis of gene expression data from a prostate cancer study.


Assuntos
Projetos de Pesquisa , Humanos , Simulação por Computador , Probabilidade , Método de Monte Carlo
4.
Bioinformatics ; 38(15): 3689-3697, 2022 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-35723568

RESUMO

MOTIVATION: PERMANOVA is currently the most commonly used method for testing community-level hypotheses about microbiome associations with covariates of interest. PERMANOVA can test for associations that result from changes in which taxa are present or absent by using the Jaccard or unweighted UniFrac distance. However, such presence-absence analyses face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias but at the potential costs of information loss and the introduction of a stochastic component into the analysis. RESULTS: Here, we develop a non-stochastic approach to PERMANOVA presence-absence analyses that aggregates information over all potential rarefaction replicates without actual resampling, when the Jaccard or unweighted UniFrac distance is used. We compare this new approach to three possible ways of aggregating PERMANOVA over multiple rarefactions obtained from resampling: averaging the distance matrix, averaging the (element-wise) squared distance matrix and averaging the F-statistic. Our simulations indicate that our non-stochastic approach is robust to confounding by library size and outperforms each of the stochastic resampling approaches. We also show that, when overdispersion is low, averaging the (element-wise) squared distance outperforms averaging the unsquared distance, currently implemented in the R package vegan. We illustrate our methods using an analysis of data on inflammatory bowel disease in which samples from case participants have systematically smaller library sizes than samples from control participants. AVAILABILITY AND IMPLEMENTATION: We have implemented all the approaches described above, including the function for calculating the analytical average of the squared or unsquared distance matrix, in our R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Microbiota , Humanos , Projetos de Pesquisa , Biblioteca Gênica
5.
Bioinformatics ; 38(10): 2915-2917, 2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35561163

RESUMO

SUMMARY: We previously developed the LDM for testing hypotheses about the microbiome that performs the test at both the community level and the individual taxon level. The LDM can be applied to relative abundance data and presence-absence data separately, which work well when associated taxa are abundant and rare, respectively. Here, we propose LDM-omni3 that combines LDM analyses at the relative abundance and presence-absence data scales, thereby offering optimal power across scenarios with different association mechanisms. The new LDM-omni3 test is available for the wide range of data types and analyses that are supported by the LDM. AVAILABILITY AND IMPLEMENTATION: The LDM-omni3 test has been added to the R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Microbiota , Coleta de Dados
6.
PLoS Comput Biol ; 18(9): e1010509, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-36103548

RESUMO

BACKGROUND: Finding microbiome associations with possibly censored survival times is an important problem, especially as specific taxa could serve as biomarkers for disease prognosis or as targets for therapeutic interventions. The two existing methods for survival outcomes, MiRKAT-S and OMiSA, are restricted to testing associations at the community level and do not provide results at the individual taxon level. An ad hoc approach testing each taxon with a survival outcome using the Cox proportional hazard model may not perform well in the microbiome setting with sparse count data and small sample sizes. METHODS: We have previously developed the linear decomposition model (LDM) for testing continuous or discrete outcomes that unifies community-level and taxon-level tests into one framework. Here we extend the LDM to test survival outcomes. We propose to use the Martingale residuals or the deviance residuals obtained from the Cox model as continuous covariates in the LDM. We further construct tests that combine the results of analyzing each set of residuals separately. Finally, we extend PERMANOVA, the most commonly used distance-based method for testing community-level hypotheses, to handle survival outcomes in a similar manner. RESULTS: Using simulated data, we showed that the LDM-based tests preserved the false discovery rate for testing individual taxa and had good sensitivity. The LDM-based community-level tests and PERMANOVA-based tests had comparable or better power than MiRKAT-S and OMiSA. An analysis of data on the association of the gut microbiome and the time to acute graft-versus-host disease revealed several dozen associated taxa that would not have been achievable by any community-level test, as well as improved community-level tests by the LDM and PERMANOVA over those obtained using MiRKAT-S and OMiSA. CONCLUSIONS: Unlike existing methods, our new methods are capable of discovering individual taxa that are associated with survival times, which could be of important use in clinical settings.


Assuntos
Microbioma Gastrointestinal , Microbiota , Modelos Lineares , Modelos de Riscos Proporcionais , Tamanho da Amostra
7.
Clin Infect Dis ; 75(4): 665-672, 2022 09 10.
Artigo em Inglês | MEDLINE | ID: mdl-34864949

RESUMO

BACKGROUND: Gestational weight gain above Institute of Medicine recommendations is associated with increased risk of pregnancy complications. The goal was to analyze the association between newer HIV antiretroviral regimens (ART) on gestational weight gain. METHODS: A retrospective cohort study of pregnant women with HIV-1 on ART. The primary outcome was incidence of excess gestational weight gain. Treatment effects were estimated by ART regimen type using log-linear models for relative risk (RR), adjusting for prepregnancy BMI and presence of detectable viral load at baseline. RESULTS: Three hundred three pregnant women were included in the analysis. Baseline characteristics, including prepregnancy BMI, viral load at prenatal care entry, and gestational age at delivery were similar by ART, including 53% of the entire cohort had initiated ART before pregnancy (P = nonsignificant). Excess gestational weight gain occurred in 29% of the cohort. Compared with non-integrase strand transfer inhibitor (-INSTI) or tenofovir alafenamide fumarate (TAF)-exposed persons, receipt of INSTI+TAF showed a 1.7-fold increased RR of excess gestational weight gain (95% CI: 1.18-2.68; P < .01), while women who received tenofovir disoproxil fumarate had a 0.64-fold decreased RR (95% CI: .41-.99; P = .047) of excess gestational weight gain. INSTI alone was not significantly associated with excess weight gain in this population. The effect of TAF without INSTI could not be inferred from our data. There was no difference in neonatal, obstetric, or maternal outcomes between the groups. CONCLUSIONS: Pregnant women receiving ART with a combined regimen of INSTI and TAF have increased risk of excess gestational weight gain.


Assuntos
Ganho de Peso na Gestação , Infecções por HIV , HIV-1 , Adenina/uso terapêutico , Antirretrovirais/uso terapêutico , Índice de Massa Corporal , Feminino , Infecções por HIV/tratamento farmacológico , Humanos , Recém-Nascido , Gravidez , Resultado da Gravidez/epidemiologia , Estudos Retrospectivos
8.
Bioinformatics ; 37(12): 1652-1657, 2021 Jul 19.
Artigo em Inglês | MEDLINE | ID: mdl-33479757

RESUMO

MOTIVATION: Many methods for testing association between the microbiome and covariates of interest (e.g. clinical outcomes, environmental factors) assume that these associations are driven by changes in the relative abundance of taxa. However, these associations may also result from changes in which taxa are present and which are absent. Analyses of such presence-absence associations face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias, but at the potential cost of information loss as well as the introduction of a stochastic component into the analysis. Currently, there is a need for robust and efficient methods for testing presence-absence associations in the presence of such confounding, both at the community level and at the individual-taxon level, that avoid the drawbacks of rarefaction. RESULTS: We have previously developed the linear decomposition model (LDM) that unifies the community-level and taxon-level tests into one framework. Here, we present an extension of the LDM for testing presence-absence associations. The extended LDM is a non-stochastic approach that repeatedly applies the LDM to all rarefied taxa count tables, averages the residual sum-of-squares (RSS) terms over the rarefaction replicates, and then forms an F-statistic based on these average RSS terms. We show that this approach compares favorably to averaging the F-statistic from R rarefaction replicates, which can only be calculated stochastically. The flexible nature of the LDM allows discrete or continuous traits or interactions to be tested while allowing confounding covariates to be adjusted for. Our simulations indicate that our proposed method is robust to any systematic differences in library size and has better power than alternative approaches. We illustrate our method using an analysis of data on inflammatory bowel disease (IBD) in which cases have systematically smaller library sizes than controls. AVAILABILITYAND IMPLEMENTATION: The R package LDM is available on GitHub at https://github.com/yijuanhu/LDM in formats appropriate for Macintosh or Windows. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.
Stat Med ; 41(15): 2879-2893, 2022 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-35352841

RESUMO

Mediation models are a set of statistical techniques that investigate the mechanisms that produce an observed relationship between an exposure variable and an outcome variable in order to deduce the extent to which the relationship is influenced by intermediate mediator variables. For a case-control study, the most common mediation analysis strategy employs a counterfactual framework that permits estimation of indirect and direct effects on the odds ratio scale for dichotomous outcomes, assuming either binary or continuous mediators. While this framework has become an important tool for mediation analysis, we demonstrate that we can embed this approach in a unified likelihood framework for mediation analysis in case-control studies that leverages more features of the data (in particular, the relationship between exposure and mediator) to improve efficiency of indirect effect estimates. One important feature of our likelihood approach is that it naturally incorporates cases within the exposure-mediator model to improve efficiency. Our approach does not require knowledge of disease prevalence and can model confounders and exposure-mediator interactions, and is straightforward to implement in standard statistical software. We illustrate our approach using both simulated data and real data from a case-control genetic study of lung cancer.


Assuntos
Modelos Estatísticos , Estudos de Casos e Controles , Fatores de Confusão Epidemiológicos , Humanos , Funções Verossimilhança , Razão de Chances
10.
Bioinformatics ; 36(14): 4106-4115, 2020 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-32315393

RESUMO

MOTIVATION: Methods for analyzing microbiome data generally fall into one of two groups: tests of the global hypothesis of any microbiome effect, which do not provide any information on the contribution of individual operational taxonomic units (OTUs); and tests for individual OTUs, which do not typically provide a global test of microbiome effect. Without a unified approach, the findings of a global test may be hard to resolve with the findings at the individual OTU level. Further, many tests of individual OTU effects do not preserve the false discovery rate (FDR). RESULTS: We introduce the linear decomposition model (LDM), that provides a single analysis path that includes global tests of any effect of the microbiome, tests of the effects of individual OTUs while accounting for multiple testing by controlling the FDR, and a connection to distance-based ordination. The LDM accommodates both continuous and discrete variables (e.g. clinical outcomes, environmental factors) as well as interaction terms to be tested either singly or in combination, allows for adjustment of confounding covariates, and uses permutation-based P-values that can control for sample correlation. The LDM can also be applied to transformed data, and an 'omnibus' test can easily combine results from analyses conducted on different transformation scales. We also provide a new implementation of PERMANOVA based on our approach. For global testing, our simulations indicate the LDM provided correct type I error and can have comparable power to existing distance-based methods. For testing individual OTUs, our simulations indicate the LDM controlled the FDR well. In contrast, DESeq2 often had inflated FDR; MetagenomeSeq generally had the lowest sensitivity. The flexibility of the LDM for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies. We also show that our implementation of PERMANOVA can outperform existing implementations. AVAILABILITY AND IMPLEMENTATION: The R package LDM is available on GitHub at https://github.com/yijuanhu/LDM in formats appropriate for Macintosh or Windows. CONTACT: yijuan.hu@emory.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Microbiota , Modelos Lineares
11.
Bioinformatics ; 34(7): 1157-1163, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29186324

RESUMO

Motivation: Inferring population structure is important for both population genetics and genetic epidemiology. Principal components analysis (PCA) has been effective in ascertaining population structure with array genotype data but can be difficult to use with sequencing data, especially when low depth leads to uncertainty in called genotypes. Because PCA is sensitive to differences in variability, PCA using sequencing data can result in components that correspond to differences in sequencing quality (read depth and error rate), rather than differences in population structure. We demonstrate that even existing methods for PCA specifically designed for sequencing data can still yield biased conclusions when used with data having sequencing properties that are systematically different across different groups of samples (i.e. sequencing groups). This situation can arise in population genetics when combining sequencing data from different studies, or in genetic epidemiology when using historical controls such as samples from the 1000 Genomes Project. Results: To allow inference on population structure using PCA in these situations, we provide an approach that is based on using sequencing reads directly without calling genotypes. Our approach is to adjust the data from different sequencing groups to have the same read depth and error rate so that PCA does not generate spurious components representing sequencing quality. To accomplish this, we have developed a subsampling procedure to match the depth distributions in different sequencing groups, and a read-flipping procedure to match the error rates. We average over subsamples and read flips to minimize loss of information. We demonstrate the utility of our approach using two datasets from 1000 Genomes, and further evaluate it using simulation studies. Availability and implementation: TASER-PC software is publicly available at http://web1.sph.emory.edu/users/yhu30/software.html. Contact: yijuan.hu@emory.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Genética Populacional/métodos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Componente Principal , Software , Algoritmos , Humanos , Análise de Sequência de DNA/métodos
12.
PLoS Genet ; 12(5): e1006040, 2016 05.
Artigo em Inglês | MEDLINE | ID: mdl-27152526

RESUMO

Next-generation sequencing of DNA provides an unprecedented opportunity to discover rare genetic variants associated with complex diseases and traits. However, the common practice of first calling underlying genotypes and then treating the called values as known is prone to false positive findings, especially when genotyping errors are systematically different between cases and controls. This happens whenever cases and controls are sequenced at different depths, on different platforms, or in different batches. In this article, we provide a likelihood-based approach to testing rare variant associations that directly models sequencing reads without calling genotypes. We consider the (weighted) burden test statistic, which is the (weighted) sum of the score statistic for assessing effects of individual variants on the trait of interest. Because variant locations are unknown, we develop a simple, computationally efficient screening algorithm to estimate the loci that are variants. Because our burden statistic may not have mean zero after screening, we develop a novel bootstrap procedure for assessing the significance of the burden statistic. We demonstrate through extensive simulation studies that the proposed tests are robust to a wide range of differential sequencing qualities between cases and controls, and are at least as powerful as the standard genotype calling approach when the latter controls type I error. An application to the UK10K data reveals novel rare variants in gene BTBD18 associated with childhood onset obesity. The relevant software is freely available.


Assuntos
Variação Genética/genética , Sequenciamento de Nucleotídeos em Larga Escala , Funções Verossimilhança , Análise de Sequência de DNA , Algoritmos , Estudos de Casos e Controles , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Software
13.
Genet Epidemiol ; 41(5): 375-387, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28560825

RESUMO

A fundamental challenge in analyzing next-generation sequencing (NGS) data is to determine an individual's genotype accurately, as the accuracy of the inferred genotype is essential to downstream analyses. Correctly estimating the base-calling error rate is critical to accurate genotype calls. Phred scores that accompany each call can be used to decide which calls are reliable. Some genotype callers, such as GATK and SAMtools, directly calculate the base-calling error rates from phred scores or recalibrated base quality scores. Others, such as SeqEM, estimate error rates from the read data without using any quality scores. It is also a common quality control procedure to filter out reads with low phred scores. However, choosing an appropriate phred score threshold is problematic as a too high threshold may lose data, while a too low threshold may introduce errors. We propose a new likelihood-based genotype-calling approach that exploits all reads and estimates the per-base error rates by incorporating phred scores through a logistic regression model. The approach, which we call PhredEM, uses the expectation-maximization (EM) algorithm to obtain consistent estimates of genotype frequencies and logistic regression parameters. It also includes a simple, computationally efficient screening algorithm to identify loci that are estimated to be monomorphic, so that only loci estimated to be nonmonomorphic require application of the EM algorithm. Like GATK, PhredEM can be used together with a linkage-disequilibrium-based method such as Beagle, which can further improve genotype calling as a refinement step. We evaluate the performance of PhredEM using both simulated data and real sequencing data from the UK10K project and the 1000 Genomes project. The results demonstrate that PhredEM performs better than either GATK or SeqEM, and that PhredEM is an improved, robust, and widely applicable genotype-calling approach for NGS studies. The relevant software is freely available.


Assuntos
Genômica/métodos , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos , Software , Algoritmos , Bases de Dados Genéticas , Humanos , Modelos Genéticos
14.
Am J Hum Genet ; 96(4): 543-54, 2015 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-25799106

RESUMO

Sequencing and exome-chip technologies have motivated development of novel statistical tests to identify rare genetic variation that influences complex diseases. Although many rare-variant association tests exist for case-control or cross-sectional studies, far fewer methods exist for testing association in families. This is unfortunate, because cosegregation of rare variation and disease status in families can amplify association signals for rare variants. Many researchers have begun sequencing (or genotyping via exome chips) familial samples that were either recently collected or previously collected for linkage studies. Because many linkage studies of complex diseases sampled affected sibships, we propose a strategy for association testing of rare variants for use in this study design. The logic behind our approach is that rare susceptibility variants should be found more often on regions shared identical by descent by affected sibling pairs than on regions not shared identical by descent. We propose both burden and variance-component tests of rare variation that are applicable to affected sibships of arbitrary size and that do not require genotype information from unaffected siblings or independent controls. Our approaches are robust to population stratification and produce analytic p values, thereby enabling our approach to scale easily to genome-wide studies of rare variation. We illustrate our methods by using simulated data and exome chip data from sibships ascertained for hypertension collected as part of the Genetic Epidemiology Network of Arteriopathy (GENOA) study.


Assuntos
Estudos de Associação Genética/métodos , Variação Genética/genética , Modelos Estatísticos , Taxa de Mutação , Doenças Raras/genética , Irmãos , Negro ou Afro-Americano/genética , Simulação por Computador , Humanos , Hipertensão/genética
15.
Stat Med ; 37(23): 3357-3372, 2018 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-29923344

RESUMO

Multisample U-statistics encompass a wide class of test statistics that allow the comparison of 2 or more distributions. U-statistics are especially powerful because they can be applied to both numeric and nonnumeric data, eg, ordinal and categorical data where a pairwise similarity or distance-like measure between categories is available. However, when comparing the distribution of a variable across 2 or more groups, observed differences may be due to confounding covariates. For example, in a case-control study, the distribution of exposure in cases may differ from that in controls entirely because of variables that are related to both exposure and case status and are distributed differently among case and control participants. We propose to use individually reweighted data (ie, using the stratification score for retrospective data or the propensity score for prospective data) to construct adjusted U-statistics that can test the equality of distributions across 2 (or more) groups in the presence of confounding covariates. Asymptotic normality of our adjusted U-statistics is established and a closed form expression of their asymptotic variance is presented. The utility of our approach is demonstrated through simulation studies, as well as in an analysis of data from a case-control study conducted among African-Americans, comparing whether the similarity in haplotypes (ie, sets of adjacent genetic loci inherited from the same parent) occurring in a case and a control participant differs from the similarity in haplotypes occurring in 2 control participants.


Assuntos
Modelos Estatísticos , Negro ou Afro-Americano/genética , Análise de Variância , Bioestatística , Estudos de Casos e Controles , Catecol O-Metiltransferase/genética , Simulação por Computador , Haplótipos , Humanos , Pontuação de Propensão , Estudos Prospectivos , Estudos Retrospectivos , Esquizofrenia/genética
16.
Am J Hum Genet ; 94(6): 845-53, 2014 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-24836453

RESUMO

There is great interest in detecting associations between human traits and rare genetic variation. To address the low power implicit in single-locus tests of rare genetic variants, many rare-variant association approaches attempt to accumulate information across a gene, often by taking linear combinations of single-locus contributions to a statistic. Using the right linear combination is key-an optimal test will up-weight true causal variants, down-weight neutral variants, and correctly assign the direction of effect for causal variants. Here, we propose a procedure that exploits data from population controls to estimate the linear combination to be used in an case-parent trio rare-variant association test. Specifically, we estimate the linear combination by comparing population control allele frequencies with allele frequencies in the parents of affected offspring. These estimates are then used to construct a rare-variant transmission disequilibrium test (rvTDT) in the case-parent data. Because the rvTDT is conditional on the parents' data, using parental data in estimating the linear combination does not affect the validity or asymptotic distribution of the rvTDT. By using simulation, we show that our new population-control-based rvTDT can dramatically improve power over rvTDTs that do not use population control information across a wide variety of genetic architectures. It also remains valid under population stratification. We apply the approach to a cohort of epileptic encephalopathy (EE) trios and find that dominant (or additive) inherited rare variants are unlikely to play a substantial role within EE genes previously identified through de novo mutation studies.


Assuntos
Epilepsia/genética , Genética Populacional/métodos , Estudo de Associação Genômica Ampla/métodos , Doenças Raras/genética , Simulação por Computador , Grupos Controle , Frequência do Gene , Loci Gênicos , Predisposição Genética para Doença , Variação Genética , Genótipo , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Pais , Fenótipo , Doenças Raras/diagnóstico
17.
Microb Ecol Health Dis ; 28(1): 1303265, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28572753

RESUMO

Background: Recent studies of various human microbiome habitats have revealed thousands of bacterial species and the existence of large variation in communities of microorganisms in the same habitats across individual human subjects. Previous efforts to summarize this diversity, notably in the human gut and vagina, have categorized microbiome profiles by clustering them into community state types (CSTs). The functional relevance of specific CSTs has not been established. Objective: We investigate whether CSTs can be used to assess dynamics in the microbiome. Design: We conduct a re-analysis of five sequencing-based microbiome surveys derived from vaginal samples with repeated measures. Results: We observe that detection of a CST transition is largely insensitive to choices in methods for normalization or clustering. We find that healthy subjects persist in a CST for two to three weeks or more on average, while those with evidence of dysbiosis tend to change more often. Changes in CST can be gradual or occur over less than one day. Upcoming CST changes and switches to high-risk CSTs can be predicted with high accuracy in certain scenarios. Finally, we observe that presence of Gardnerella vaginalis is a strong predictor of an upcoming CST change. Conclusion: Overall, our results show that the CST concept is useful for studying microbiome dynamics.

18.
Genet Epidemiol ; 38 Suppl 1: S49-56, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25112188

RESUMO

In the past decade, genome-wide association studies have been successful in identifying genetic loci that play a role in many complex diseases. Despite this, it has become clear that for many traits, investigation of single common variants does not give a complete picture of the genetic contribution to the phenotype. Therefore a number of new approaches are currently being investigated to further the search for susceptibility loci or regions. We summarize the contributions to Genetic Analysis Workshop 18 (GAW18) that concern this search using methods for population-based association analysis. Many of the members of our GAW18 working group made use of data types that have only recently become available through the use of next-generation sequencing technologies, with many focusing on the investigation of rare variants instead of or in combination with common variants. Some contributors used a haplotype-based approach, which to date has been used relatively infrequently but may become more important for analyzing rare variant association data. Others analyzed gene-gene or gene-environment interactions, where novel statistical approaches were needed to make the best use of the available information without requiring an excessive computational burden. GAW18 provided participants with the chance to make use of state-of-the-art data, statistical techniques, and technology. We report here some of the experiences and conclusions that were reached by workshop participants who analyzed the GAW18 data as a population-based association study.


Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Pressão Sanguínea/genética , Variação Genética , Genética Populacional , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
19.
Am J Hum Genet ; 91(2): 215-23, 2012 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-22818855

RESUMO

Many case-control tests of rare variation are implemented in statistical frameworks that make correction for confounders like population stratification difficult. Simple permutation of disease status is unacceptable for resolving this issue because the replicate data sets do not have the same confounding as the original data set. These limitations make it difficult to apply rare-variant tests to samples in which confounding most likely exists, e.g., samples collected from admixed populations. To enable the use of such rare-variant methods in structured samples, as well as to facilitate permutation tests for any situation in which case-control tests require adjustment for confounding covariates, we propose to establish the significance of a rare-variant test via a modified permutation procedure. Our procedure uses Fisher's noncentral hypergeometric distribution to generate permuted data sets with the same structure present in the actual data set such that inference is valid in the presence of confounding factors. We use simulated sequence data based on coalescent models to show that our permutation strategy corrects for confounding due to population stratification that, if ignored, would otherwise inflate the size of a rare-variant test. We further illustrate the approach by using sequence data from the Dallas Heart Study of energy metabolism traits. Researchers can implement our permutation approach by using the R package BiasedUrn.


Assuntos
Estudos de Casos e Controles , Fatores de Confusão Epidemiológicos , Interpretação Estatística de Dados , Variação Genética , Doenças Raras/genética , Software , Simulação por Computador , Humanos , Modelos Genéticos , Dados de Sequência Molecular
20.
Genome Res ; 22(4): 623-32, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22300631

RESUMO

DNA methylation (DNAm) plays diverse roles in human biology, but this dynamic epigenetic mark remains far from fully characterized. Although earlier studies uncovered loci that undergo age-associated DNAm changes in adults, little is known about such changes during childhood. Despite profound DNAm plasticity during embryogenesis, monozygotic twins show indistinguishable childhood methylation, suggesting that DNAm is highly coordinated throughout early development. Here we examine the methylation of 27,578 CpG dinucleotides in peripheral blood DNA from a cross-sectional study of 398 boys, aged 3-17 yr, and find significant age-associated changes in DNAm at 2078 loci. These findings correspond well with pyrosequencing data and replicate in a second pediatric population (N = 78). Moreover, we report a deficit of age-related loci on the X chromosome, a preference for specific nucleotides immediately surrounding the interrogated CpG dinucleotide, and a primary association with developmental and immune ontological functions. Meta-analysis (N = 1158) with two adult populations reveals that despite a significant overlap of age-associated loci, most methylation changes do not follow a lifelong linear pattern due to a threefold to fourfold higher rate of change in children compared with adults; consequently, the vast majority of changes are more accurately modeled as a function of logarithmic age. We therefore conclude that age-related DNAm changes in peripheral blood occur more rapidly during childhood and are imperfectly accounted for by statistical corrections that are linear in age, further suggesting that future DNAm studies should be matched closely for age.


Assuntos
Ilhas de CpG/genética , Metilação de DNA , Perfilação da Expressão Gênica , Genoma Humano/genética , Adolescente , Adulto , Fatores Etários , Sítios de Ligação/genética , Criança , Pré-Escolar , Estudos Transversais , Humanos , Masculino , Metanálise como Assunto , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA