Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Mol Biol Evol ; 35(11): 2805-2818, 2018 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30137463

RESUMO

Phylogeny estimation is difficult for closely related populations and species, especially if they have been exchanging genes. We present a hierarchical Bayesian, Markov-chain Monte Carlo method with a state space that includes all possible phylogenies in a full Isolation-with-Migration model framework. The method is based on a new type of genealogy augmentation called a "hidden genealogy" that enables efficient updating of the phylogeny. This is the first likelihood-based method to fully incorporate directional gene flow and genetic drift for estimation of a species or population phylogeny. Application to human hunter-gatherer populations from Africa revealed a clear phylogenetic history, with strong support for gene exchange with an unsampled ghost population, and relatively ancient divergence between a ghost population and modern human populations, consistent with human/archaic divergence. In contrast, a study of five chimpanzee populations reveals a clear phylogeny with several pairs of populations having exchanged DNA, but does not support a history with an unsampled ghost population.


Assuntos
Fluxo Gênico , Técnicas Genéticas , Filogenia , Animais , Teorema de Bayes , Deriva Genética , Migração Humana , Humanos , Método de Monte Carlo , Pan troglodytes/genética
2.
Mol Biol Evol ; 34(6): 1517-1528, 2017 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-28333230

RESUMO

We present a new Bayesian method for estimating demographic and phylogenetic history using population genomic data. Several key innovations are introduced that allow the study of diverse models within an Isolation-with-Migration framework. The new method implements a 2-step analysis, with an initial Markov chain Monte Carlo (MCMC) phase that samples simple coalescent trees, followed by the calculation of the joint posterior density for the parameters of a demographic model. In step 1, the MCMC sampling phase, the method uses a reduced state space, consisting of coalescent trees without migration paths, and a simple importance sampling distribution without the demography of interest. Once obtained, a single sample of trees can be used in step 2 to calculate the joint posterior density for model parameters under multiple diverse demographic models, without having to repeat MCMC runs. Because migration paths are not included in the state space of the MCMC phase, but rather are handled by analytic integration in step 2 of the analysis, the method is scalable to a large number of loci with excellent MCMC mixing properties. With an implementation of the new method in the computer program MIST, we demonstrate the method's accuracy, scalability, and other advantages using simulated data and DNA sequences of two common chimpanzee subspecies: Pan troglodytes (P. t.) troglodytes and P. t. verus.


Assuntos
Teorema de Bayes , Genômica/métodos , Algoritmos , Evolução Biológica , Demografia , Evolução Molecular , Variação Genética/genética , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo , Filogenia , Software
3.
Value Health ; 21(8): 967-972, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-30098675

RESUMO

BACKGROUND: In 2016, the Food and Drug Administration (FDA) released a Pilot Clinical Outcome Assessment Compendium (COA Compendium) intended to foster patient-focused drug development (PFDD). However, it is unclear whether patient perspectives were solicited during development or validation of the included patient-reported outcome (PRO) measures. OBJECTIVE: To examine the pedigree of a sample of measures included in the COA Compendium. METHODS: PROs included in chapters 1 or 2 of the COA Compendium were extracted and three reviewers independently searched PubMed and Google to identify information on measure pedigree. Data on method and stage of measure development where patient engagement took place were documented. RESULTS: Among the 26 evaluated PRO measures, we were unable to identify information on development or validation on nearly half the sample (n = 12). Among the remaining 14 measures, 5 did not include any evidence of patient engagement; 2 engaged patients during concept elicitation only; 1 engaged patients during psychometric validation only; and 6 engaged patients during both concept elicitation and cognitive interviewing. Measures either previously qualified or submitted for qualification were more likely to include patient engagement. CONCLUSIONS: For the FDA Pilot COA Compendium to fulfill its purpose of fostering PFDD, it needs fine-tuning to reflect today's standards, improving transparency and facilitating clear identification of included measures so that the level of patient engagement, among other factors, can be properly assessed. Suggested improvements include identifying clinical trials that correspond to the COA Compendium's use in drug development; more clearly identifying which measure is referred to; and including only those measures that already qualified or undergoing qualification.


Assuntos
Avaliação de Resultados em Cuidados de Saúde/métodos , Participação do Paciente/métodos , Medidas de Resultados Relatados pelo Paciente , Humanos , Projetos Piloto , Estados Unidos , United States Food and Drug Administration/organização & administração
4.
Genomics ; 107(2-3): 76-82, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26721311

RESUMO

Laryngeal cancer disproportionately affects more African-Americans than European-Americans. Here, we analyze the genome-wide somatic point mutations from the tumors of 13 African-Americans and 57 European-Americans from TCGA to differentiate between environmental and ancestrally-inherited factors. The mean number of mutations was different between African-Americans (151.31) and European-Americans (277.63). Other differences in the overall mutational landscape between African-American and European-American were also found. The frequency of C>A, and C>G were significantly different between the two populations (p-value<0.05). Context nucleotide signatures for some mutation types significantly differ between these two populations. Thus, the context nucleotide signatures along with other factors could be related to the observed mutational landscape differences between two races. Finally, we show that mutated genes associated with these mutational differences differ between the two populations. Thus, at the molecular level, race appears to be a factor in the progression of laryngeal cancer with ancestral genomic signatures best explaining these differences.


Assuntos
Negro ou Afro-Americano/genética , Predisposição Genética para Doença/etnologia , Neoplasias Laríngeas/genética , Mutação Puntual , Frequência do Gene , Genética Populacional , Humanos , Neoplasias Laríngeas/etnologia , Estados Unidos/etnologia , População Branca/genética
5.
Mol Ecol ; 24(20): 5078-83, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26456794

RESUMO

The population genetic study of divergence is often carried out using a Bayesian genealogy sampler, like those implemented in ima2 and related programs, and these analyses frequently include a likelihood ratio test of the null hypothesis of no migration between populations. Cruickshank and Hahn (2014, Molecular Ecology, 23, 3133-3157) recently reported a high rate of false-positive test results with ima2 for data simulated with small numbers of loci under models with no migration and recent splitting times. We confirm these findings and discover that they are caused by a failure of the assumptions underlying likelihood ratio tests that arises when using marginal likelihoods for a subset of model parameters. We also show that for small data sets, with little divergence between samples from two populations, an excellent fit can often be found by a model with a low migration rate and recent splitting time and a model with a high migration rate and a deep splitting time.


Assuntos
Fluxo Gênico , Especiação Genética , Ilhas Genômicas , Modelos Genéticos , Animais
6.
Genomics Inform ; 21(2): e27, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37415456

RESUMO

Recombination events complicate the evolutionary history of populations and species and have a significant impact on the inference of isolation-with-migration (IM) models. However, several existing methods have been developed, assuming no recombination within a locus and free recombination between loci. In this study, we investigated the effect of recombination on the estimation of IM models using genomic data. We conducted a simulation study to evaluate the consistency of the parameter estimators with up to 1,000 loci and analyze true gene trees to examine the sources of errors in estimating the IM model parameters. The results showed that the presence of recombination led to biased estimates of the IM model parameters, with population sizes being more overestimated and migration rates being more underestimated as the number of loci increased. The magnitude of the biases tended to increase with the recombination rates when using 100 or more loci. On the other hand, the estimation of splitting times remained consistent as the number of loci increased. In the absence of recombination, the estimators of the IM model parameters remained consistent.

7.
Syst Biol ; 60(3): 261-75, 2011 May.
Artigo em Inglês | MEDLINE | ID: mdl-21368324

RESUMO

With the increasing interest in recognizing the discordance between gene genealogies, various gene tree/species tree reconciliation methods have been developed. We present here the first attempt to assess and compare two such Bayesian methods, Bayesian estimation of species trees (BEST) and BUCKy (Bayesian untangling of concordance knots), in the presence of several known processes of gene tree discordance. DNA alignments were simulated under the influence of incomplete lineage sorting (ILS) and of horizontal gene transfer (HGT). BEST and BUCKy both account for uncertainty in gene tree estimation but differ substantially in their assumptions of what caused gene tree discordance. BEST estimates a species tree using the coalescent model, assuming that all gene tree discordance is due to ILS. BUCKy does not assume any specific biological process of gene tree discordance through the use of a nonparametric clustering of concordant genes. BUCKy estimates the concordance factor (CF) of a clade, which is defined as the proportion of genes that truly have the clade in their trees. The estimated concordance tree is then built from clades with the highest estimated CFs. Because of their different assumptions, it was expected that BEST would perform better in the presence of ILS and that BUCKy would perform better in the presence of HGT. As expected, the species tree was more accurately reconstructed by BUCKy in the presence of HGT, when the HGT events were unevenly placed across the species tree. BUCKy and BEST performed similarly in most other cases, including in the presence of strong ILS and of HGT events that were evenly placed across the tree. However, BUCKy was shown to underestimate the uncertainty in CF estimation, with short credibility intervals. Despite this, the discordance pattern estimated by BUCKy could be compared with the signature of ILS. The resulting test for the adequacy of the coalescent model proved to have low Type I error. It was powerful when HGT was the major source of discordance and when HGT events were unevenly placed across the species tree.


Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Transferência Genética Horizontal , Filogenia , Software , Simulação por Computador , Especiação Genética
8.
Genomics Inform ; 20(3): e34, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-36239111

RESUMO

Multilevel analysis is an appropriate and powerful tool for analyzing hierarchical structure data widely applied from public health to genomic data. In practice, however, we may lose the information on multiple nesting levels in the multilevel analysis since data may fail to capture all levels of hierarchy, or the top or intermediate levels of hierarchy are ignored in the analysis. In this study, we consider a multilevel linear mixed effect model (LMM) with single imputation that can involve all data hierarchy levels in the presence of missing top or intermediate-level clusters. We evaluate and compare the performance of a multilevel LMM with single imputation with other models ignoring the data hierarchy or missing intermediate-level clusters. To this end, we applied a multilevel LMM with single imputation and other models to hierarchically structured cohort data with some intermediate levels missing and to simulated data with various cluster sizes and missing rates of intermediate-level clusters. A thorough simulation study demonstrated that an LMM with single imputation estimates fixed coefficients and variance components of a multilevel model more accurately than other models ignoring data hierarchy or missing clusters in terms of mean squared error and coverage probability. In particular, when models ignoring data hierarchy or missing clusters were applied, the variance components of random effects were overestimated. We observed similar results from the analysis of hierarchically structured cohort data.

9.
Genomics Inform ; 17(4): e37, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31896237

RESUMO

Isolation-with-migration (IM) models have become popular for explaining population divergence in the presence of migrations. Bayesian methods are commonly used to estimate IM models, but they are limited to small data analysis or simple model inference. Recently three methods, IMa3, MIST and AIM, resolved these limitations. Here, we describe the major problems addressed by these three software and compare differences among their inference methods, despite their use of the same standard likelihood function.

10.
Bioinformatics ; 23(1): 71-6, 2007 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-17092990

RESUMO

MOTIVATION: The identification and characterization of genes that increase the susceptibility to common complex multifactorial diseases is a challenging task in genetic association studies. The multifactor dimensionality reduction (MDR) method has been proposed and implemented by Ritchie et al. (2001) to identify the combinations of multilocus genotypes and discrete environmental factors that are associated with a particular disease. However, the original MDR method classifies the combination of multilocus genotypes into high-risk and low-risk groups in an ad hoc manner based on a simple comparison of the ratios of the number of cases and controls. Hence, the MDR approach is prone to false positive and negative errors when the ratio of the number of cases and controls in a combination of genotypes is similar to that in the entire data, or when both the number of cases and controls is small. Hence, we propose the odds ratio based multifactor dimensionality reduction (OR MDR) method that uses the odds ratio as a new quantitative measure of disease risk. RESULTS: While the original MDR method provides a simple binary measure of risk, the OR MDR method provides not only the odds ratio as a quantitative measure of risk but also the ordering of the multilocus combinations from the highest risk to lowest risk groups. Furthermore, the OR MDR method provides a confidence interval for the odds ratio for each multilocus combination, which is extremely informative in judging its importance as a risk factor. The proposed OR MDR method is illustrated using the dataset obtained from the CDC Chronic Fatigue Syndrome Research Group. AVAILABILITY: The program written in R is available.


Assuntos
Modelos Genéticos , Modelos Estatísticos , Razão de Chances , Mapeamento de Interação de Proteínas/métodos , Síndrome de Fadiga Crônica/genética , Variação Genética , Humanos , Proteínas de Membrana Transportadoras/genética , Polimorfismo Genético , Medição de Risco , Fatores de Transcrição/análise , Fatores de Transcrição/genética
11.
Bioinformatics ; 23(19): 2589-95, 2007 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-17872915

RESUMO

MOTIVATION: The identification and characterization of susceptibility genes that influence the risk of common and complex diseases remains a statistical and computational challenge in genetic association studies. This is partly because the effect of any single genetic variant for a common and complex disease may be dependent on other genetic variants (gene-gene interaction) and environmental factors (gene-environment interaction). To address this problem, the multifactor dimensionality reduction (MDR) method has been proposed by Ritchie et al. to detect gene-gene interactions or gene-environment interactions. The MDR method identifies polymorphism combinations associated with the common and complex multifactorial diseases by collapsing high-dimensional genetic factors into a single dimension. That is, the MDR method classifies the combination of multilocus genotypes into high-risk and low-risk groups based on a comparison of the ratios of the numbers of cases and controls. When a high-order interaction model is considered with multi-dimensional factors, however, there may be many sparse or empty cells in the contingency tables. The MDR method cannot classify an empty cell as high risk or low risk and leaves it as undetermined. RESULTS: In this article, we propose the log-linear model-based multifactor dimensionality reduction (LM MDR) method to improve the MDR in classifying sparse or empty cells. The LM MDR method estimates frequencies for empty cells from a parsimonious log-linear model so that they can be assigned to high-and low-risk groups. In addition, LM MDR includes MDR as a special case when the saturated log-linear model is fitted. Simulation studies show that the LM MDR method has greater power and smaller error rates than the MDR method. The LM MDR method is also compared with the MDR method using as an example sporadic Alzheimer's disease.


Assuntos
Doença de Alzheimer/genética , Predisposição Genética para Doença/genética , Modelos Genéticos , Família Multigênica/genética , Proteínas do Tecido Nervoso/genética , Mapeamento de Interação de Proteínas/métodos , Medição de Risco/métodos , Algoritmos , Simulação por Computador , Humanos , Estatística como Assunto
12.
Artigo em Inglês | MEDLINE | ID: mdl-24384712

RESUMO

Ancestral recombination events can cause the underlying genealogy of a site to vary along the genome. We consider Bayesian models to simultaneously detect recombination breakpoints in very long sequence alignments and estimate the phylogenetic tree of each block between breakpoints. The models we consider use a dissimilarity measure between trees in their prior distribution to favor similar trees at neighboring loci. We show empirical evidence in Enterobacteria that neighboring genomic regions have similar trees. The main hurdle in using such models is the need to properly calculate the normalizing function for the prior probabilities on trees. In this work, we quantify the impact of approximating this normalizing function as done in biomc2, a hierarchical Bayesian method to detect recombination based on distance between tree topologies. We then derive an algorithm to calculate the normalizing function exactly, for a Gibbs distribution based on the Robinson-Foulds (RF) distance between gene trees at neighboring loci. At the core is the calculation of the joint distribution of the shape of a random tree and its RF distance to a fixed tree. We also propose fast approximations to the normalizing function, which are shown to be very accurate with little impact on the Bayesian inference.


Assuntos
Evolução Biológica , Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Enterobacteriaceae/genética , Recombinação Genética/genética , Análise de Sequência de DNA/métodos , Sequência de Bases , Dados de Sequência Molecular , Distribuições Estatísticas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA