Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Emerg Infect Dis ; 29(5): 977-987, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37081530

RESUMO

Combining genomic and geospatial data can be useful for understanding Mycobacterium tuberculosis transmission in high-burden tuberculosis (TB) settings. We performed whole-genome sequencing on M. tuberculosis DNA extracted from sputum cultures from a population-based TB study conducted in Gaborone, Botswana, during 2012-2016. We determined spatial distribution of cases on the basis of shared genotypes among isolates. We considered clusters of isolates with ≤5 single-nucleotide polymorphisms identified by whole-genome sequencing to indicate recent transmission and clusters of ≥10 persons to be outbreaks. We obtained both molecular and geospatial data for 946/1,449 (65%) participants with culture-confirmed TB; 62 persons belonged to 5 outbreaks of 10-19 persons each. We detected geospatial clustering in just 2 of those 5 outbreaks, suggesting heterogeneous spatial patterns. Our findings indicate that targeted interventions applied in smaller geographic areas of high-burden TB identified using integrated genomic and geospatial data might help interrupt TB transmission during outbreaks.


Assuntos
Mycobacterium tuberculosis , Tuberculose , Humanos , Botsuana/epidemiologia , Tuberculose/microbiologia , Mycobacterium tuberculosis/genética , Genótipo , Genômica
2.
Biometrics ; 78(4): 1530-1541, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-34374071

RESUMO

Stochastic epidemic models (SEMs) fit to incidence data are critical to elucidating outbreak dynamics, shaping response strategies, and preparing for future epidemics. SEMs typically represent counts of individuals in discrete infection states using Markov jump processes (MJPs), but are computationally challenging as imperfect surveillance, lack of subject-level information, and temporal coarseness of the data obscure the true epidemic. Analytic integration over the latent epidemic process is impossible, and integration via Markov chain Monte Carlo (MCMC) is cumbersome due to the dimensionality and discreteness of the latent state space. Simulation-based computational approaches can address the intractability of the MJP likelihood, but are numerically fragile and prohibitively expensive for complex models. A linear noise approximation (LNA) that approximates the MJP transition density with a Gaussian density has been explored for analyzing prevalence data in large-population settings, but requires modification for analyzing incidence counts without assuming that the data are normally distributed. We demonstrate how to reparameterize SEMs to appropriately analyze incidence data, and fold the LNA into a data augmentation MCMC framework that outperforms deterministic methods, statistically, and simulation-based methods, computationally. Our framework is computationally robust when the model dynamics are complex and applies to a broad class of SEMs. We evaluate our method in simulations that reflect Ebola, influenza, and SARS-CoV-2 dynamics, and apply our method to national surveillance counts from the 2013-2015 West Africa Ebola outbreak.


Assuntos
COVID-19 , Epidemias , Doença pelo Vírus Ebola , Humanos , Doença pelo Vírus Ebola/epidemiologia , Incidência , COVID-19/epidemiologia , SARS-CoV-2 , Cadeias de Markov , Método de Monte Carlo , Processos Estocásticos , Teorema de Bayes
3.
Emerg Infect Dis ; 27(10): 2604-2618, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34545792

RESUMO

We conducted a detailed analysis of coronavirus disease in a large population center in southern California, USA (Orange County, population 3.2 million), to determine heterogeneity in risks for infection, test positivity, and death. We used a combination of datasets, including a population-representative seroprevalence survey, to assess the actual burden of disease and testing intensity, test positivity, and mortality. In the first month of the local epidemic (March 2020), case incidence clustered in high-income areas. This pattern quickly shifted, and cases next clustered in much higher rates in the north-central area of the county, which has a lower socioeconomic status. Beginning in April 2020, a concentration of reported cases, test positivity, testing intensity, and seropositivity in a north-central area persisted. At the individual level, several factors (e.g., age, race or ethnicity, and ZIP codes with low educational attainment) strongly affected risk for seropositivity and death.


Assuntos
COVID-19 , Epidemias , California/epidemiologia , Humanos , SARS-CoV-2 , Estudos Soroepidemiológicos
4.
Syst Biol ; 69(2): 209-220, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31504998

RESUMO

The marginal likelihood of a model is a key quantity for assessing the evidence provided by the data in support of a model. The marginal likelihood is the normalizing constant for the posterior density, obtained by integrating the product of the likelihood and the prior with respect to model parameters. Thus, the computational burden of computing the marginal likelihood scales with the dimension of the parameter space. In phylogenetics, where we work with tree topologies that are high-dimensional models, standard approaches to computing marginal likelihoods are very slow. Here, we study methods to quickly compute the marginal likelihood of a single fixed tree topology. We benchmark the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real data sets under the JC69 model. These methods include several new ones that we develop explicitly to solve this problem, as well as existing algorithms that we apply to phylogenetic models for the first time. Altogether, our results show that the accuracy of these methods varies widely, and that accuracy does not necessarily correlate with computational burden. Our newly developed methods are orders of magnitude faster than standard approaches, and in some cases, their accuracy rivals the best established estimators.


Assuntos
Classificação/métodos , Filogenia , Biologia Computacional/normas , Funções Verossimilhança
5.
PLoS Comput Biol ; 16(8): e1008030, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32804924

RESUMO

The human body generates a diverse set of high affinity antibodies, the soluble form of B cell receptors (BCRs), that bind to and neutralize invading pathogens. The natural development of BCRs must be understood in order to design vaccines for highly mutable pathogens such as influenza and HIV. BCR diversity is induced by naturally occurring combinatorial "V(D)J" rearrangement, mutation, and selection processes. Most current methods for BCR sequence analysis focus on separately modeling the above processes. Statistical phylogenetic methods are often used to model the mutational dynamics of BCR sequence data, but these techniques do not consider all the complexities associated with B cell diversification such as the V(D)J rearrangement process. In particular, standard phylogenetic approaches assume the DNA bases of the progenitor (or "naive") sequence arise independently and according to the same distribution, ignoring the complexities of V(D)J rearrangement. In this paper, we introduce a novel approach to Bayesian phylogenetic inference for BCR sequences that is based on a phylogenetic hidden Markov model (phylo-HMM). This technique not only integrates a naive rearrangement model with a phylogenetic model for BCR sequence evolution but also naturally accounts for uncertainty in all unobserved variables, including the phylogenetic tree, via posterior distribution sampling.


Assuntos
Modelos Genéticos , Receptores de Antígenos de Linfócitos B , Análise de Sequência de DNA/métodos , Teorema de Bayes , Biologia Computacional , Rearranjo Gênico do Linfócito B/genética , Humanos , Cadeias de Markov , Filogenia , Receptores de Antígenos de Linfócitos B/classificação , Receptores de Antígenos de Linfócitos B/genética , Receptores de Antígenos de Linfócitos B/imunologia , Hipermutação Somática de Imunoglobulina/genética , Vacinas
6.
PLoS Comput Biol ; 16(10): e1007999, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33112848

RESUMO

Birth-death processes have given biologists a model-based framework to answer questions about changes in the birth and death rates of lineages in a phylogenetic tree. Therefore birth-death models are central to macroevolutionary as well as phylodynamic analyses. Early approaches to studying temporal variation in birth and death rates using birth-death models faced difficulties due to the restrictive choices of birth and death rate curves through time. Sufficiently flexible time-varying birth-death models are still lacking. We use a piecewise-constant birth-death model, combined with both Gaussian Markov random field (GMRF) and horseshoe Markov random field (HSMRF) prior distributions, to approximate arbitrary changes in birth rate through time. We implement these models in the widely used statistical phylogenetic software platform RevBayes, allowing us to jointly estimate birth-death process parameters, phylogeny, and nuisance parameters in a Bayesian framework. We test both GMRF-based and HSMRF-based models on a variety of simulated diversification scenarios, and then apply them to both a macroevolutionary and an epidemiological dataset. We find that both models are capable of inferring variable birth rates and correctly rejecting variable models in favor of effectively constant models. In general the HSMRF-based model has higher precision than its GMRF counterpart, with little to no loss of accuracy. Applied to a macroevolutionary dataset of the Australian gecko family Pygopodidae (where birth rates are interpretable as speciation rates), the GMRF-based model detects a slow decrease whereas the HSMRF-based model detects a rapid speciation-rate decrease in the last 12 million years. Applied to an infectious disease phylodynamic dataset of sequences from HIV subtype A in Russia and Ukraine (where birth rates are interpretable as the rate of accumulation of new infections), our models detect a strongly elevated rate of infection in the 1990s.


Assuntos
Coeficiente de Natalidade , Modelos Biológicos , Modelos Estatísticos , Mortalidade , Algoritmos , Animais , Teorema de Bayes , Evolução Biológica , Biologia Computacional , Simulação por Computador , Lagartos/fisiologia
7.
PLoS Comput Biol ; 16(10): e1007774, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33044955

RESUMO

Coalescent theory combined with statistical modeling allows us to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. When sequences are sampled serially through time and the distribution of the sampling times depends on the effective population size, explicit statistical modeling of sampling times improves population size estimation. Previous work assumed that the genealogy relating sampled sequences is known and modeled sampling times as an inhomogeneous Poisson process with log-intensity equal to a linear function of the log-transformed effective population size. We improve this approach in two ways. First, we extend the method to allow for joint Bayesian estimation of the genealogy, effective population size trajectory, and other model parameters. Next, we improve the sampling time model by incorporating additional sources of information in the form of time-varying covariates. We validate our new modeling framework using a simulation study and apply our new methodology to analyses of population dynamics of seasonal influenza and to the recent Ebola virus outbreak in West Africa.


Assuntos
Genética Populacional/métodos , Modelos Estatísticos , Densidade Demográfica , Teorema de Bayes , Biologia Computacional , Ebolavirus/genética , Genoma Viral/genética , Doença pelo Vírus Ebola/epidemiologia , Doença pelo Vírus Ebola/virologia , Humanos , Influenza Humana/epidemiologia , Influenza Humana/virologia , Orthomyxoviridae/genética , Dinâmica Populacional
8.
Biometrics ; 76(3): 677-690, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32277713

RESUMO

Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state-of-the-art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change-point models or Gaussian process priors. Change-point models suffer from computational issues when the number of change-points is unknown and needs to be estimated. Gaussian process-based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log-transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change-point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state-of-the-art methods.


Assuntos
Genética Populacional , Modelos Estatísticos , Teorema de Bayes , Densidade Demográfica , Dinâmica Populacional
9.
Proc Natl Acad Sci U S A ; 114(29): E5854-E5863, 2017 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-28679631

RESUMO

Devoid of all known canonical actin-binding proteins, the prevalent parasite Giardia lamblia uses an alternative mechanism for cytokinesis. Unique aspects of this mechanism can potentially be leveraged for therapeutic development. Here, live-cell imaging methods were developed for Giardia to establish division kinetics and the core division machinery. Surprisingly, Giardia cytokinesis occurred with a median time that is ∼60 times faster than mammalian cells. In contrast to cells that use a contractile ring, actin was not concentrated in the furrow and was not directly required for furrow progression. Live-cell imaging and morpholino depletion of axonemal Paralyzed Flagella 16 indicated that flagella-based forces initiated daughter cell separation and provided a source for membrane tension. Inhibition of membrane partitioning blocked furrow progression, indicating a requirement for membrane trafficking to support furrow advancement. Rab11 was found to load onto the intracytoplasmic axonemes late in mitosis and to accumulate near the ends of nascent axonemes. These developing axonemes were positioned to coordinate trafficking into the furrow and mark the center of the cell in lieu of a midbody/phragmoplast. We show that flagella motility, Rab11, and actin coordination are necessary for proper abscission. Organisms representing three of the five eukaryotic supergroups lack myosin II of the actomyosin contractile ring. These results support an emerging view that flagella play a central role in cell division among protists that lack myosin II and additionally implicate the broad use of membrane tension as a mechanism to drive abscission.


Assuntos
Membrana Celular/metabolismo , Flagelos/metabolismo , Giardia lamblia/citologia , Miosinas/metabolismo , Actinas/metabolismo , Brefeldina A/farmacologia , Membrana Celular/efeitos dos fármacos , Citocinese/fisiologia , Técnicas de Silenciamento de Genes , Giardia lamblia/efeitos dos fármacos , Giardia lamblia/genética , Giardia lamblia/metabolismo , Mitose , Miosinas/genética , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Tubulina (Proteína)/metabolismo , Proteínas rab de Ligação ao GTP/genética , Proteínas rab de Ligação ao GTP/metabolismo
10.
Mol Biol Evol ; 35(5): 1253-1265, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29474671

RESUMO

Modern biological techniques enable very dense genetic sampling of unfolding evolutionary histories, and thus frequently sample some genotypes multiple times. This motivates strategies to incorporate genotype abundance information in phylogenetic inference. In this article, we synthesize a stochastic process model with standard sequence-based phylogenetic optimality, and show that tree estimation is substantially improved by doing so. Our method is validated with extensive simulations and an experimental single-cell lineage tracing study of germinal center B cell receptor affinity maturation.


Assuntos
Genótipo , Modelos Genéticos , Filogenia , Animais , Linfócitos B , Camundongos , Processos Estocásticos
11.
PLoS Comput Biol ; 14(10): e1006388, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30332400

RESUMO

B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same "clonal family") are released from the germinal center; their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called "substitution profiles", are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method "Substitution Profiles Using Related Families" (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on two external datasets. Furthermore, we provide a command-line tool in an open-source software package (https://github.com/krdav/SPURF) implementing these ideas and providing easy prediction using our pre-fit models.


Assuntos
Substituição de Aminoácidos/genética , Aminoácidos/metabolismo , Receptores de Antígenos de Linfócitos B/química , Receptores de Antígenos de Linfócitos B/metabolismo , Aminoácidos/genética , Animais , Linfócitos B/química , Linfócitos B/metabolismo , Células Clonais , Biologia Computacional , Bases de Dados de Proteínas , Humanos , Modelos Imunológicos , Receptores de Antígenos de Linfócitos B/genética , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo
12.
BMC Genomics ; 19(1): 835, 2018 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-30463511

RESUMO

BACKGROUND: Helicobacter pylori is a human stomach pathogen, naturally-competent for DNA uptake, and prone to homologous recombination. Extensive homoplasy (i.e., phylogenetically-unlinked identical variations) observed in H. pylori genes is considered a hallmark of such recombination. However, H. pylori also exhibits a high mutation rate. The relative adaptive role of homologous recombination and mutation in species diversity is a highly-debated issue in biology. Recombination results in homoplasy. While convergent mutation can also account for homoplasy, its contribution is thought to be minor. We demonstrate here that, contrary to dogma, convergent mutation is a key contributor to Helicobacter pylori homoplasy, potentially driven by adaptive evolution of proteins. RESULTS: Our present genome-wide analysis shows that homoplastic nonsynonymous (amino acid replacement) changes are not typically accompanied by homoplastic synonymous (silent) variations. Moreover, the majority of the codon positions with homoplastic nonsynonymous changes also contain different (i.e. non-homoplastic) nonsynonymous changes arising from mutation only. This indicates that, to a considerable extent, nonsynonymous homoplasy is due to convergent mutations. High mutation rate or limited availability of evolvable sites cannot explain this excessive convergence, as suggested by our simulation studies. Rather, the genes with convergent mutations are overrepresented in distinct functional categories, suggesting possible selective responses to conditions such as distinct micro-niches in single hosts, and to differences in host genotype, physiology, habitat and diet. CONCLUSIONS: We propose that mutational convergence is a key player in H. pylori's adaptation and extraordinary persistence in human hosts. High frequency of mutational convergence could be due to saturation of evolvable sites capable of responding to selection pressures, while the number of mutable residues is far from saturation. We anticipate a similar scenario of mutational vs. recombinational genome dynamics or plasticity for other naturally competent microbes where strong positive selection could favor frequent convergent mutations in adaptive protein evolution.


Assuntos
Evolução Biológica , Infecções por Helicobacter/microbiologia , Helicobacter pylori/genética , Recombinação Genética , Estômago/microbiologia , Variação Genética , Genoma Bacteriano , Helicobacter pylori/patogenicidade , Humanos , Filogenia , Seleção Genética
13.
J Math Biol ; 76(4): 911-944, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-28741177

RESUMO

Birth-death processes track the size of a univariate population, but many biological systems involve interaction between populations, necessitating models for two or more populations simultaneously. A lack of efficient methods for evaluating finite-time transition probabilities of bivariate processes, however, has restricted statistical inference in these models. Researchers rely on computationally expensive methods such as matrix exponentiation or Monte Carlo approximation, restricting likelihood-based inference to small systems, or indirect methods such as approximate Bayesian computation. In this paper, we introduce the birth/birth-death process, a tractable bivariate extension of the birth-death process, where rates are allowed to be nonlinear. We develop an efficient algorithm to calculate its transition probabilities using a continued fraction representation of their Laplace transforms. Next, we identify several exemplary models arising in molecular epidemiology, macro-parasite evolution, and infectious disease modeling that fall within this class, and demonstrate advantages of our proposed method over existing approaches to inference in these models. Notably, the ubiquitous stochastic susceptible-infectious-removed (SIR) model falls within this class, and we emphasize that computable transition probabilities newly enable direct inference of parameters in the SIR model. We also propose a very fast method for approximating the transition probabilities under the SIR model via a novel branching process simplification, and compare it to the continued fraction representation method with application to the 17th century plague in Eyam. Although the two methods produce similar maximum a posteriori estimates, the branching process approximation fails to capture the correlation structure in the joint posterior distribution.


Assuntos
Modelos Biológicos , Algoritmos , Animais , Teorema de Bayes , Doenças Transmissíveis/epidemiologia , Biologia Computacional , Simulação por Computador , Inglaterra/epidemiologia , Epidemias/estatística & dados numéricos , História do Século XVII , Interações Hospedeiro-Parasita , Humanos , Funções Verossimilhança , Cadeias de Markov , Conceitos Matemáticos , Método de Monte Carlo , Peste/epidemiologia , Peste/história , Probabilidade , Processos Estocásticos
14.
Syst Biol ; 65(3): 465-77, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-26738927

RESUMO

The anomaly zone, defined by the presence of gene tree topologies that are more probable than the true species tree, presents a major challenge to the accurate resolution of many parts of the Tree of Life. This discrepancy can result from consecutive rapid speciation events in the species tree. Similar to the problem of long-branch attraction, including more data via loci concatenation will only reinforce the support for the incorrect species tree. Empirical phylogenetic studies often employ coalescent-based species tree methods to avoid the anomaly zone, but to this point these studies have not had a method for providing any direct evidence that the species tree is actually in the anomaly zone. In this study, we use 16 species of lizards in the family Scincidae to investigate whether nodes that are difficult to resolve place the species tree within the anomaly zone. We analyze new phylogenomic data (429 loci), using both concatenation and coalescent-based species tree estimation, to locate conflicting topological signal. We then use the unifying principle of the anomaly zone, together with estimates of ancestral population sizes and species persistence times, to determine whether the observed phylogenetic conflict is a result of the anomaly zone. We identify at least three regions of the Scincidae phylogeny that provide demographic signatures consistent with the anomaly zone, and this new information helps reconcile the phylogenetic conflict in previously published studies on these lizards. The anomaly zone presents a real problem in phylogenetics, and our new framework for identifying anomalous relationships will help empiricists leverage their resources appropriately for investigating and overcoming this challenge.


Assuntos
Classificação/métodos , Lagartos/classificação , Lagartos/genética , Modelos Genéticos , Filogenia , Animais , Genoma/genética
15.
PLoS Comput Biol ; 12(3): e1004789, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26938243

RESUMO

Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals' genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples.


Assuntos
Genética Populacional , Hemaglutininas/genética , Vírus da Influenza A Subtipo H3N2/genética , Modelos Genéticos , Modelos Estatísticos , Evolução Biológica , Simulação por Computador , Interpretação Estatística de Dados , Variação Genética/genética , Filogenia , Tamanho da Amostra
16.
Proc Natl Acad Sci U S A ; 111(44): E4736-42, 2014 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-25336755

RESUMO

Despite contingency in life's history, the similarity of evolutionarily convergent traits may represent predictable solutions to common conditions. However, the extent to which overall gene expression levels (transcriptomes) underlying convergent traits are themselves convergent remains largely unexplored. Here, we show strong statistical support for convergent evolutionary origins and massively parallel evolution of the entire transcriptomes in symbiotic bioluminescent organs (bacterial photophores) from two divergent squid species. The gene expression similarities are so strong that regression models of one species' photophore can predict organ identity of a distantly related photophore from gene expression levels alone. Our results point to widespread parallel changes in gene expression evolution associated with convergent origins of complex organs. Therefore, predictable solutions may drive not only the evolution of novel, complex organs but also the evolution of overall gene expression levels that underlie them.


Assuntos
Bactérias/metabolismo , Decapodiformes/metabolismo , Evolução Molecular , Regulação da Expressão Gênica/fisiologia , Simbiose/fisiologia , Transcriptoma/fisiologia , Animais , Bactérias/genética , Decapodiformes/genética
17.
Bioinformatics ; 31(20): 3282-9, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26093147

RESUMO

MOTIVATION: The field of phylodynamics focuses on the problem of reconstructing population size dynamics over time using current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g. influenza virus. Phylodynamic inference uses a coalescent model that defines a probability density for the genealogy of randomly sampled individuals from the population. When we assume that such a genealogy is known, the coalescent model, equipped with a Gaussian process prior on population size trajectory, allows for nonparametric Bayesian estimation of population size dynamics. Although this approach is quite powerful, large datasets collected during infectious disease surveillance challenge the state-of-the-art of Bayesian phylodynamics and demand inferential methods with relatively low computational cost. RESULTS: To satisfy this demand, we provide a computationally efficient Bayesian inference framework based on Hamiltonian Monte Carlo for coalescent process models. Moreover, we show that by splitting the Hamiltonian function, we can further improve the efficiency of this approach. Using several simulated and real datasets, we show that our method provides accurate estimates of population size dynamics and is substantially faster than alternative methods based on elliptical slice sampler and Metropolis-adjusted Langevin algorithm. AVAILABILITY AND IMPLEMENTATION: The R code for all simulation studies and real data analysis conducted in this article are publicly available at http://www.ics.uci.edu/∼slan/lanzi/CODES.html and in the R package phylodyn available at https://github.com/mdkarcher/phylodyn. CONTACT: S.Lan@warwick.ac.uk or babaks@uci.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genética Populacional/métodos , Algoritmos , Teorema de Bayes , Humanos , Influenza Humana/epidemiologia , Modelos Estatísticos , Método de Monte Carlo , Orthomyxoviridae/genética , Densidade Demográfica , Dinâmica Populacional , Software , Estatísticas não Paramétricas
18.
Stat Appl Genet Mol Biol ; 14(4): 375-89, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26061623

RESUMO

When estimating a phylogeny from a multiple sequence alignment, researchers often assume the absence of recombination. However, if recombination is present, then tree estimation and all downstream analyses will be impacted, because different segments of the sequence alignment support different phylogenies. Similarly, convergent selective pressures at the molecular level can also lead to phylogenetic tree incongruence across the sequence alignment. Current methods for detection of phylogenetic incongruence are not equipped to distinguish between these two different mechanisms and assume that the incongruence is a result of recombination or other horizontal transfer of genetic information. We propose a new recombination detection method that can make this distinction, based on synonymous codon substitution distances. Although some power is lost by discarding the information contained in the nonsynonymous substitutions, our new method has lower false positive probabilities than the comparable recombination detection method when the phylogenetic incongruence signal is due to convergent evolution. We apply our method to three empirical examples, where we analyze: (1) sequences from a transmission network of the human immunodeficiency virus, (2) tlpB gene sequences from a geographically diverse set of 38 Helicobacter pylori strains, and (3) hepatitis C virus sequences sampled longitudinally from one patient.


Assuntos
Evolução Molecular , Modelos Genéticos , Recombinação Genética , Algoritmos , Simulação por Computador , Infecções por HIV/transmissão , Infecções por HIV/virologia , HIV-1/genética , Infecções por Helicobacter/microbiologia , Helicobacter pylori/genética , Hepacivirus/genética , Hepatite C/virologia , Humanos , Modelos Estatísticos , Filogenia
19.
Syst Biol ; 63(4): 534-42, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24627183

RESUMO

The multispecies coalescent has provided important progress for evolutionary inferences, including increasing the statistical rigor and objectivity of comparisons among competing species delimitation models. However, Bayesian species delimitation methods typically require brute force integration over gene trees via Markov chain Monte Carlo (MCMC), which introduces a large computation burden and precludes their application to genomic-scale data. Here we combine a recently introduced dynamic programming algorithm for estimating species trees that bypasses MCMC integration over gene trees with sophisticated methods for estimating marginal likelihoods, needed for Bayesian model selection, to provide a rigorous and computationally tractable technique for genome-wide species delimitation. We provide a critical yet simple correction that brings the likelihoods of different species trees, and more importantly their corresponding marginal likelihoods, to the same common denominator, which enables direct and accurate comparisons of competing species delimitation models using Bayes factors. We test this approach, which we call Bayes factor delimitation (*with genomic data; BFD*), using common species delimitation scenarios with computer simulations. Varying the numbers of loci and the number of samples suggest that the approach can distinguish the true model even with few loci and limited samples per species. Misspecification of the prior for population size θ has little impact on support for the true model. We apply the approach to West African forest geckos (Hemidactylus fasciatus complex) using genome-wide SNP data. This new Bayesian method for species delimitation builds on a growing trend for objective species delimitation methods with explicit model assumptions that are easily tested. [Bayes factor; model testing; phylogeography; RADseq; simulation; speciation.].


Assuntos
Genoma/genética , Filogenia , Filogeografia/métodos , Polimorfismo de Nucleotídeo Único/genética , Algoritmos , Animais , Teorema de Bayes , Simulação por Computador , Lagartos/genética
20.
Biometrics ; 71(4): 1009-21, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26148963

RESUMO

Continuous-time birth-death-shift (BDS) processes are frequently used in stochastic modeling, with many applications in ecology and epidemiology. In particular, such processes can model evolutionary dynamics of transposable elements-important genetic markers in molecular epidemiology. Estimation of the effects of individual covariates on the birth, death, and shift rates of the process can be accomplished by analyzing patient data, but inferring these rates in a discretely and unevenly observed setting presents computational challenges. We propose a multi-type branching process approximation to BDS processes and develop a corresponding expectation maximization algorithm, where we use spectral techniques to reduce calculation of expected sufficient statistics to low-dimensional integration. These techniques yield an efficient and robust optimization routine for inferring the rates of the BDS process, and apply broadly to multi-type branching processes whose rates can depend on many covariates. After rigorously testing our methodology in simulation studies, we apply our method to study intrapatient time evolution of IS6110 transposable element, a genetic marker frequently used during estimation of epidemiological clusters of Mycobacterium tuberculosis infections.


Assuntos
Evolução Molecular , Sequências Repetitivas Dispersas , Funções Verossimilhança , Algoritmos , Animais , Biometria/métodos , Humanos , Modelos Genéticos , Modelos Estatísticos , Epidemiologia Molecular/estatística & dados numéricos , Dinâmica Populacional/estatística & dados numéricos , Processos Estocásticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA