Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 179
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
JAMA Intern Med ; 2020 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-32065600

RESUMO

Importance: Chlorthalidone is currently recommended as the preferred thiazide diuretic to treat hypertension, but no trials have directly compared risks and benefits. Objective: To compare the effectiveness and safety of chlorthalidone and hydrochlorothiazide as first-line therapies for hypertension in real-world practice. Design, Setting, and Participants: This is a Large-Scale Evidence Generation and Evaluation in a Network of Databases (LEGEND) observational comparative cohort study with large-scale propensity score stratification and negative-control and synthetic positive-control calibration on databases spanning January 2001 through December 2018. Outpatient and inpatient care episodes of first-time users of antihypertensive monotherapy in the United States based on 2 administrative claims databases and 1 collection of electronic health records were analyzed. Analysis began June 2018. Exposures: Chlorthalidone and hydrochlorothiazide. Main Outcomes and Measures: The primary outcomes were acute myocardial infarction, hospitalization for heart failure, ischemic or hemorrhagic stroke, and a composite cardiovascular disease outcome including the first 3 outcomes and sudden cardiac death. Fifty-one safety outcomes were measured. Results: Of 730 225 individuals (mean [SD] age, 51.5 [13.3] years; 450 100 women [61.6%]), 36 918 were dispensed or prescribed chlorthalidone and had 149 composite outcome events, and 693 337 were dispensed or prescribed hydrochlorothiazide and had 3089 composite outcome events. No significant difference was found in the associated risk of myocardial infarction, hospitalized heart failure, or stroke, with a calibrated hazard ratio for the composite cardiovascular outcome of 1.00 for chlorthalidone compared with hydrochlorothiazide (95% CI, 0.85-1.17). Chlorthalidone was associated with a significantly higher risk of hypokalemia (hazard ratio [HR], 2.72; 95% CI, 2.38-3.12), hyponatremia (HR, 1.31; 95% CI, 1.16-1.47), acute renal failure (HR, 1.37; 95% CI, 1.15-1.63), chronic kidney disease (HR, 1.24; 95% CI, 1.09-1.42), and type 2 diabetes mellitus (HR, 1.21; 95% CI, 1.12-1.30). Chlorthalidone was associated with a significantly lower risk of diagnosed abnormal weight gain (HR, 0.73; 95% CI, 0.61-0.86). Conclusions and Relevance: This study found that chlorthalidone use was not associated with significant cardiovascular benefits when compared with hydrochlorothiazide, while its use was associated with greater risk of renal and electrolyte abnormalities. These findings do not support current recommendations to prefer chlorthalidone vs hydrochlorothiazide for hypertension treatment in first-time users was found. We used advanced methods, sensitivity analyses, and diagnostics, but given the possibility of residual confounding and the limited length of observation periods, further study is warranted.

2.
Viruses ; 12(2)2020 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-32033422

RESUMO

Infections with HIV-1 group M subtype B viruses account for the majority of the HIV epidemic in the Western world. Phylogeographic studies have placed the introduction of subtype B in the United States in New York around 1970, where it grew into a major source of spread. Currently, it is estimated that over one million people are living with HIV in the US and that most are infected with subtype B variants. Here, we aim to identify the drivers of HIV-1 subtype B dispersal in the United States by analyzing a collection of 23,588 pol sequences, collected for drug resistance testing from 45 states during 2004-2011. To this end, we introduce a workflow to reduce this large collection of data to more computationally-manageable sample sizes and apply the BEAST framework to test which covariates associate with the spread of HIV-1 across state borders. Our results show that we are able to consistently identify certain predictors of spread under reasonable run times across datasets of up to 10,000 sequences. However, the general lack of phylogenetic structure and the high uncertainty associated with HIV trees make it difficult to interpret the epidemiological relevance of the drivers of spread we are able to identify. While the workflow we present here could be applied to other virus datasets of a similar scale, the characteristic star-like shape of HIV-1 phylogenies poses a serious obstacle to reconstructing a detailed evolutionary and spatial history for HIV-1 subtype B in the US.

3.
Elife ; 92020 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-31939738

RESUMO

For pathogens infecting single host species evolutionary trade-offs have previously been demonstrated between pathogen-induced mortality rates and transmission rates. It remains unclear, however, how such trade-offs impact sub-lethal pathogen-inflicted damage, and whether these trade-offs even occur in broad host-range pathogens. Here, we examine changes over the past 110 years in symptoms induced in maize by the broad host-range pathogen, maize streak virus (MSV). Specifically, we use the quantified symptom intensities of cloned MSV isolates in differentially resistant maize genotypes to phylogenetically infer ancestral symptom intensities and check for phylogenetic signal associated with these symptom intensities. We show that whereas symptoms reflecting harm to the host have remained constant or decreased, there has been an increase in how extensively MSV colonizes the cells upon which transmission vectors feed. This demonstrates an evolutionary trade-off between amounts of pathogen-inflicted harm and how effectively viruses position themselves within plants to enable onward transmission.

4.
Korean Circ J ; 50(1): 52-68, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31642211

RESUMO

BACKGROUND AND OBJECTIVES: 2018 ESC/ESH Hypertension guideline recommends 2-drug combination as initial anti-hypertensive therapy. However, real-world evidence for effectiveness of recommended regimens remains limited. We aimed to compare the effectiveness of first-line anti-hypertensive treatment combining 2 out of the following classes: angiotensin-converting enzyme (ACE) inhibitors/angiotensin-receptor blocker (A), calcium channel blocker (C), and thiazide-type diuretics (D). METHODS: Treatment-naïve hypertensive adults without cardiovascular disease (CVD) who initiated dual anti-hypertensive medications were identified in 5 databases from US and Korea. The patients were matched for each comparison set by large-scale propensity score matching. Primary endpoint was all-cause mortality. Myocardial infarction, heart failure, stroke, and major adverse cardiac and cerebrovascular events as a composite outcome comprised the secondary measure. RESULTS: A total of 987,983 patients met the eligibility criteria. After matching, 222,686, 32,344, and 38,513 patients were allocated to A+C vs. A+D, C+D vs. A+C, and C+D vs. A+D comparison, respectively. There was no significant difference in the mortality during total of 1,806,077 person-years: A+C vs. A+D (hazard ratio [HR], 1.08; 95% confidence interval [CI], 0.97-1.20; p=0.127), C+D vs. A+C (HR, 0.93; 95% CI, 0.87-1.01; p=0.067), and C+D vs. A+D (HR, 1.18; 95% CI, 0.95-1.47; p=0.104). A+C was associated with a slightly higher risk of heart failure (HR, 1.09; 95% CI, 1.01-1.18; p=0.040) and stroke (HR, 1.08; 95% CI, 1.01-1.17; p=0.040) than A+D. CONCLUSIONS: There was no significant difference in mortality among A+C, A+D, and C+D combination treatment in patients without previous CVD. This finding was consistent across multi-national heterogeneous cohorts in real-world practice.

5.
Bioinformatics ; 2019 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-31790143

RESUMO

MOTIVATION: The potentially low precision associated with the geographic origin of sampled sequences represents an important limitation for spatially-explicit (i.e. continuous) phylogeographic inference of fast-evolving pathogens such as RNA viruses. A substantial proportion of publicly available sequences are geo-referenced at broad spatial scale such as, for example, the administrative unit of origin rather than more exact locations (e.g. GPS coordinates). Most frequently, such sequences are either discarded prior to continuous phylogeographic inference or arbitrarily assigned to the geographic coordinates of the centroid of their administrative area of origin for lack of a better possibility. RESULTS: We here implement and describe a new approach that allows to incorporate heterogeneous prior sampling probabilities over a geographic area. External data, such as outbreak locations, are used to specify these prior sampling probabilities over a collection of sub-polygons. We apply this new method to the analysis of highly pathogenic avian influenza (HPAI) H5N1 clade data in the Mekong region. Our method allows to properly include, in continuous phylogeographic analyses, H5N1 sequences that are only associated with large administrative areas of origin and assign them with more accurate locations. Finally, we use continuous phylogeographic reconstructions to analyse the dispersal dynamics of different H5N1 clades and investigate the impact of environmental factors on lineage dispersal velocities. AVAILABILITY: Our new method allowing heterogeneous sampling priors for continuous phylogeographic inference is implemented in the open-source multi-platform software package BEAST 1.10. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online and on figshare.com.

6.
Stat Med ; 2019 Dec 08.
Artigo em Inglês | MEDLINE | ID: mdl-31814146

RESUMO

Sparse high-dimensional massive sample size (sHDMSS) time-to-event data present multiple challenges to quantitative researchers as most current sparse survival regression methods and software will grind to a halt and become practically inoperable. This paper develops a scalable ℓ0 -based sparse Cox regression tool for right-censored time-to-event data that easily takes advantage of existing high performance implementation of ℓ2 -penalized regression method for sHDMSS time-to-event data. Specifically, we extend the ℓ0 -based broken adaptive ridge (BAR) methodology to the Cox model, which involves repeatedly performing reweighted ℓ2 -penalized regression. We rigorously show that the resulting estimator for the Cox model is selection consistent, oracle for parameter estimation, and has a grouping property for highly correlated covariates. Furthermore, we implement our BAR method in an R package for sHDMSS time-to-event data by leveraging existing efficient algorithms for massive ℓ2 -penalized Cox regression. We evaluate the BAR Cox regression method by extensive simulations and illustrate its application on an sHDMSS time-to-event data from the National Trauma Data Bank with hundreds of thousands of observations and tens of thousands sparsely represented covariates.

7.
PLoS Pathog ; 15(12): e1007976, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31809523

RESUMO

Since the ignition of the HIV-1 group M pandemic in the beginning of the 20th century, group M lineages have spread heterogeneously throughout the world. Subtype C spread rapidly through sub-Saharan Africa and is currently the dominant HIV lineage worldwide. Yet the epidemiological and evolutionary circumstances that contributed to its epidemiological expansion remain poorly understood. Here, we analyse 346 novel pol sequences from the DRC to compare the evolutionary dynamics of the main HIV-1 lineages, subtypes A1, C and D. Our results place the origins of subtype C in the 1950s in Mbuji-Mayi, the mining city of southern DRC, while subtypes A1 and D emerged in the capital city of Kinshasa, and subtypes H and J in the less accessible port city of Matadi. Following a 15-year period of local transmission in southern DRC, we find that subtype C spread at least three-fold faster than other subtypes circulating in Central and East Africa. In conclusion, our results shed light on the origins of HIV-1 main lineages and suggest that socio-historical rather than evolutionary factors may have determined the epidemiological fate of subtype C in sub-Saharan Africa.

8.
Virus Evol ; 5(2): vez036, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31720009

RESUMO

The need to estimate divergence times in evolutionary histories in the presence of various sources of substitution rate variation has stimulated a rich development of relaxed molecular clock models. Viral evolutionary studies frequently adopt an uncorrelated clock model as a generic relaxed molecular clock process, but this may impose considerable estimation bias if discrete rate variation exists among clades or lineages. For HIV-1 group M, rate variation among subtypes has been shown to result in inconsistencies in time to the most recent common ancestor estimation. Although this calls into question the adequacy of available molecular dating methods, no solution to this problem has been offered so far. Here, we investigate the use of mixed effects molecular clock models, which combine both fixed and random effects in the evolutionary rate, to estimate divergence times. Using simulation, we demonstrate that this model outperforms existing molecular clock models in a Bayesian framework for estimating time-measured phylogenies in the presence of mixed sources of rate variation, while also maintaining good performance in simpler scenarios. By analysing a comprehensive HIV-1 group M complete genome data set we confirm considerable rate variation among subtypes that is not adequately modelled by uncorrelated relaxed clock models. The mixed effects clock model can accommodate this rate variation and results in a time to the most recent common ancestor of HIV-1 group M of 1920 (1915-25), which is only slightly earlier than the uncorrelated relaxed clock estimate for the same data set. The use of complete genome data appears to have a more profound impact than the molecular clock model because it reduces the credible intervals by 50 per cent relative to similar estimates based on short envelope gene sequences.

9.
J Math Biol ; 2019 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-31754778

RESUMO

Maximum likelihood estimators are used extensively to estimate unknown parameters of stochastic trait evolution models on phylogenetic trees. Although the MLE has been proven to converge to the true value in the independent-sample case, we cannot appeal to this result because trait values of different species are correlated due to shared evolutionary history. In this paper, we consider a 2-state symmetric model for a single binary trait and investigate the theoretical properties of the MLE for the transition rate in the large-tree limit. Here, the large-tree limit is a theoretical scenario where the number of taxa increases to infinity and we can observe the trait values for all species. Specifically, we prove that the MLE converges to the true value under some regularity conditions. These conditions ensure that the tree shape is not too irregular, and holds for many practical scenarios such as trees with bounded edges, trees generated from the Yule (pure birth) process, and trees generated from the coalescent point process. Our result also provides an upper bound for the distance between the MLE and the true value.

10.
Lancet ; 394(10211): 1816-1826, 2019 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-31668726

RESUMO

BACKGROUND: Uncertainty remains about the optimal monotherapy for hypertension, with current guidelines recommending any primary agent among the first-line drug classes thiazide or thiazide-like diuretics, angiotensin-converting enzyme inhibitors, angiotensin receptor blockers, dihydropyridine calcium channel blockers, and non-dihydropyridine calcium channel blockers, in the absence of comorbid indications. Randomised trials have not further refined this choice. METHODS: We developed a comprehensive framework for real-world evidence that enables comparative effectiveness and safety evaluation across many drugs and outcomes from observational data encompassing millions of patients, while minimising inherent bias. Using this framework, we did a systematic, large-scale study under a new-user cohort design to estimate the relative risks of three primary (acute myocardial infarction, hospitalisation for heart failure, and stroke) and six secondary effectiveness and 46 safety outcomes comparing all first-line classes across a global network of six administrative claims and three electronic health record databases. The framework addressed residual confounding, publication bias, and p-hacking using large-scale propensity adjustment, a large set of control outcomes, and full disclosure of hypotheses tested. FINDINGS: Using 4·9 million patients, we generated 22 000 calibrated, propensity-score-adjusted hazard ratios (HRs) comparing all classes and outcomes across databases. Most estimates revealed no effectiveness differences between classes; however, thiazide or thiazide-like diuretics showed better primary effectiveness than angiotensin-converting enzyme inhibitors: acute myocardial infarction (HR 0·84, 95% CI 0·75-0·95), hospitalisation for heart failure (0·83, 0·74-0·95), and stroke (0·83, 0·74-0·95) risk while on initial treatment. Safety profiles also favoured thiazide or thiazide-like diuretics over angiotensin-converting enzyme inhibitors. The non-dihydropyridine calcium channel blockers were significantly inferior to the other four classes. INTERPRETATION: This comprehensive framework introduces a new way of doing observational health-care science at scale. The approach supports equivalence between drug classes for initiating monotherapy for hypertension-in keeping with current guidelines, with the exception of thiazide or thiazide-like diuretics superiority to angiotensin-converting enzyme inhibitors and the inferiority of non-dihydropyridine calcium channel blockers. FUNDING: US National Science Foundation, US National Institutes of Health, Janssen Research & Development, IQVIA, South Korean Ministry of Health & Welfare, Australian National Health and Medical Research Council.


Assuntos
Anti-Hipertensivos/uso terapêutico , Hipertensão/tratamento farmacológico , Adolescente , Adulto , Idoso , Antagonistas de Receptores de Angiotensina/efeitos adversos , Antagonistas de Receptores de Angiotensina/uso terapêutico , Inibidores da Enzima Conversora de Angiotensina/efeitos adversos , Inibidores da Enzima Conversora de Angiotensina/uso terapêutico , Anti-Hipertensivos/efeitos adversos , Bloqueadores dos Canais de Cálcio/efeitos adversos , Bloqueadores dos Canais de Cálcio/uso terapêutico , Criança , Estudos de Coortes , Pesquisa Comparativa da Efetividade/métodos , Bases de Dados Factuais , Diuréticos/efeitos adversos , Diuréticos/uso terapêutico , Medicina Baseada em Evidências/métodos , Feminino , Insuficiência Cardíaca/etiologia , Insuficiência Cardíaca/prevenção & controle , Humanos , Hipertensão/complicações , Masculino , Pessoa de Meia-Idade , Infarto do Miocárdio/etiologia , Infarto do Miocárdio/prevenção & controle , Acidente Vascular Cerebral/etiologia , Acidente Vascular Cerebral/prevenção & controle , Adulto Jovem
11.
Stat Med ; 38(22): 4199-4208, 2019 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-31436848

RESUMO

The case-control design is widely used in retrospective database studies, often leading to spectacular findings. However, results of these studies often cannot be replicated, and the advantage of this design over others is questionable. To demonstrate the shortcomings of applications of this design, we replicate two published case-control studies. The first investigates isotretinoin and ulcerative colitis using a simple case-control design. The second focuses on dipeptidyl peptidase-4 inhibitors and acute pancreatitis, using a nested case-control design. We include large sets of negative control exposures (where the true odds ratio is believed to be 1) in both studies. Both replication studies produce effect size estimates consistent with the original studies, but also generate estimates for the negative control exposures showing substantial residual bias. In contrast, applying a self-controlled design to answer the same questions using the same data reveals far less bias. Although the case-control design in general is not at fault, its application in retrospective database studies, where all exposure and covariate data for the entire cohort are available, is unnecessary, as other alternatives such as cohort and self-controlled designs are available. Moreover, by focusing on cases and controls it opens the door to inappropriate comparisons between exposure groups, leading to confounding for which the design has few options to adjust for. We argue that this design should no longer be used in these types of data. At the very least, negative control exposures should be used to prove that the concerns raised here do not apply.

12.
Methods Mol Biol ; 1910: 691-722, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31278682

RESUMO

In this chapter, we focus on the computational challenges associated with statistical phylogenomics and how use of the broad-platform evolutionary analysis general likelihood evaluator (BEAGLE), a high-performance library for likelihood computation, can help to substantially reduce computation time in phylogenomic and phylodynamic analyses. We discuss computational improvements brought about by the BEAGLE library on a variety of state-of-the-art multicore hardware, and for a range of commonly used evolutionary models. For data sets of varying dimensions, we specifically focus on comparing performance in the Bayesian evolutionary analysis by sampling trees (BEAST) software between multicore central processing units (CPUs) and a wide range of graphics processing cards (GPUs). We put special emphasis on computational benchmarks from the field of phylodynamics, which combines the challenges of phylogenomics with those of modelling trait data associated with the observed sequence data. In conclusion, we show that for increasingly large molecular sequence data sets, GPUs can offer tremendous computational advancements through the use of the BEAGLE library, which is available for software packages for both Bayesian inference and maximum-likelihood frameworks.


Assuntos
Teorema de Bayes , Biologia Computacional , Genômica , Filogenia , Software , Animais , Biologia Computacional/métodos , Evolução Molecular , Genômica/métodos , Humanos , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Reprodutibilidade dos Testes
14.
PLoS Comput Biol ; 15(4): e1006650, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30958812

RESUMO

Elaboration of Bayesian phylogenetic inference methods has continued at pace in recent years with major new advances in nearly all aspects of the joint modelling of evolutionary data. It is increasingly appreciated that some evolutionary questions can only be adequately answered by combining evidence from multiple independent sources of data, including genome sequences, sampling dates, phenotypic data, radiocarbon dates, fossil occurrences, and biogeographic range information among others. Including all relevant data into a single joint model is very challenging both conceptually and computationally. Advanced computational software packages that allow robust development of compatible (sub-)models which can be composed into a full model hierarchy have played a key role in these developments. Developing such software frameworks is increasingly a major scientific activity in its own right, and comes with specific challenges, from practical software design, development and engineering challenges to statistical and conceptual modelling challenges. BEAST 2 is one such computational software platform, and was first announced over 4 years ago. Here we describe a series of major new developments in the BEAST 2 core platform and model hierarchy that have occurred since the first release of the software, culminating in the recent 2.5 release.


Assuntos
Teorema de Bayes , Evolução Biológica , Filogenia , Software , Animais , Biologia Computacional , Simulação por Computador , Evolução Molecular , Humanos , Cadeias de Markov , Modelos Genéticos , Método de Monte Carlo
15.
Syst Biol ; 68(6): 1052-1061, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31034053

RESUMO

BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrated into recent versions of popular phylogenetics software packages including BEAST and MrBayes and has been widely used across a diverse range of evolutionary studies. Here, we present BEAGLE 3 with new parallel implementations, increased performance for challenging data sets, improved scalability, and better usability. We have added new OpenCL and central processing unit-threaded implementations to the library, allowing the effective utilization of a wider range of modern hardware. Further, we have extended the API and library to support concurrent computation of independent partial likelihood arrays, for increased performance of nucleotide-model analyses with greater flexibility of data partitioning. For better scalability and usability, we have improved how phylogenetic software packages use BEAGLE in multi-GPU (graphics processing unit) and cluster environments, and introduced an automated method to select the fastest device given the data set, evolutionary model, and hardware. For application developers who wish to integrate the library, we also have developed an online tutorial. To evaluate the effect of the improvements, we ran a variety of benchmarks on state-of-the-art hardware. For a partitioned exemplar analysis, we observe run-time performance improvements as high as 5.9-fold over our previous GPU implementation. BEAGLE 3 is free, open-source software licensed under the Lesser GPL and available at https://beagle-dev.github.io.


Assuntos
Classificação/métodos , Software/normas , Interpretação Estatística de Dados , Filogenia
16.
Mol Biol Evol ; 36(8): 1793-1803, 2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-31004175

RESUMO

Many factors complicate the estimation of time scales for phylogenetic histories, requiring increasingly complex evolutionary models and inference procedures. The widespread application of molecular clock dating has led to the insight that evolutionary rate estimates may vary with the time frame of measurement. This is particularly well established for rapidly evolving viruses that can accumulate sequence divergence over years or even months. However, this rapid evolution stands at odds with a relatively high degree of conservation of viruses or endogenous virus elements over much longer time scales. Building on recent insights into time-dependent evolutionary rates, we develop a formal and flexible Bayesian statistical inference approach that accommodates rate variation through time. We evaluate the novel molecular clock model on a foamy virus cospeciation history and a lentivirus evolutionary history and compare the performance to other molecular clock models. For both virus examples, we estimate a similarly strong time-dependent effect that implies rates varying over four orders of magnitude. The application of an analogous codon substitution model does not implicate long-term purifying selection as the cause of this effect. However, selection does appear to affect divergence time estimates for the less deep evolutionary history of the Ebolavirus genus. Finally, we explore the application of our approach on woolly mammoth ancient DNA data, which shows a much weaker, but still important, time-dependent rate effect that has a noticeable impact on node age estimates. Future developments aimed at incorporating more complex evolutionary processes will further add to the broad applicability of our approach.

17.
Virus Evol ; 5(1): vey043, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-30838129

RESUMO

Discrete phylogeography using software such as BEAST considers the sampling location of each taxon as fixed; often to a single location without uncertainty. When studying viruses, this implies that there is no possibility that the location of the infected host for that taxa is somewhere else. Here, we relaxed this strong assumption and allowed for analytic integration of uncertainty for discrete virus phylogeography. We used automatic language processing methods to find and assign uncertainty to alternative potential locations. We considered two influenza case studies: H5N1 in Egypt; H1N1 pdm09 in North America. For each, we implemented scenarios in which 25 per cent of the taxa had different amounts of sampling uncertainty including 10, 30, and 50 per cent uncertainty and varied how it was distributed for each taxon. This includes scenarios that: (i) placed a specific amount of uncertainty on one location while uniformly distributing the remaining amount across all other candidate locations (correspondingly labeled 10, 30, and 50); (ii) assigned the remaining uncertainty to just one other location; thus 'splitting' the uncertainty among two locations (i.e. 10/90, 30/70, and 50/50); and (iii) eliminated uncertainty via two predefined heuristic approaches: assignment to a centroid location (CNTR) or the largest population in the country (POP). We compared all scenarios to a reference standard (RS) in which all taxa had known (absolutely certain) locations. From this, we implemented five random selections of 25 per cent of the taxa and used these for specifying uncertainty. We performed posterior analyses for each scenario, including: (a) virus persistence, (b) migration rates, (c) trunk rewards, and (d) the posterior probability of the root state. The scenarios with sampling uncertainty were closer to the RS than CNTR and POP. For H5N1, the absolute error of virus persistence had a median range of 0.005-0.047 for scenarios with sampling uncertainty-(i) and (ii) above-versus a range of 0.063-0.075 for CNTR and POP. Persistence for the pdm09 case study followed a similar trend as did our analyses of migration rates across scenarios (i) and (ii). When considering the posterior probability of the root state, we found all but one of the H5N1 scenarios with sampling uncertainty had agreement with the RS on the origin of the outbreak whereas both CNTR and POP disagreed. Our results suggest that assigning geospatial uncertainty to taxa benefits estimation of virus phylogeography as compared to ad-hoc heuristics. We also found that, in general, there was limited difference in results regardless of how the sampling uncertainty was assigned; uniform distribution or split between two locations did not greatly impact posterior results. This framework is available in BEAST v.1.10. In future work, we will explore viruses beyond influenza. We will also develop a web interface for researchers to use our language processing methods to find and assign uncertainty to alternative potential locations for virus phylogeography.

18.
Ann Stat ; 46(4): 1481-1512, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-30344357

RESUMO

It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct "gene tree." Although the gene tree may deviate from the "species tree" due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to incorporate such additional information. Recent studies using simulation and empirical data suggest that a likelihood penalty quantifying concordance with a species tree can significantly improve the accuracy of gene tree reconstruction compared to using sequence data alone. However, the consistency of such an approach has not yet been established, nor have convergence rates been bounded. Because phylogenetics is a non-standard inference problem, the standard theory does not apply. In this paper, we propose a penalized maximum likelihood estimator for gene tree reconstruction, where the penalty is the square of the Billera-Holmes-Vogtmann geodesic distance from the gene tree to the species tree. We prove that this method is consistent, and derive its convergence rate for estimating the discrete gene tree structure and continuous edge lengths (representing the amount of evolution that has occurred on that branch) simultaneously. We find that the regularized estimator is "adaptive fast converging," meaning that it can reconstruct all edges of length greater than any given threshold from gene sequences of polynomial length. Our method does not require the species tree to be known exactly; in fact, our asymptotic theory holds for any such guide tree.

19.
Curr Opin Virol ; 31: 24-32, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-30248578

RESUMO

Time-stamped, trait-annotated phylogenetic trees built from virus genome data are increasingly used for outbreak investigation and monitoring ongoing epidemics. This routinely involves reconstructing the spatial and demographic processes from large data sets to help unveil the patterns and drivers of virus spread. Such phylodynamic inferences can however become quite time-consuming as the dimensions of the data increase, which has led to a myriad of approaches that aim to tackle this complexity. To elucidate the current state of the art in the field of phylodynamics, we discuss recent developments in Bayesian inference and accompanying software, highlight methods for improving computational efficiency and relevant visualisation tools. As an alternative to fully Bayesian approaches, we touch upon conditional software pipelines that compromise between statistical coherence and turn-around-time, and we highlight the available software packages. Finally, we outline future directions that may facilitate the large-scale tracking of epidemics in near real time.


Assuntos
Biologia Computacional/tendências , Genoma Viral , Filogenia , Software , Vírus/genética , Algoritmos , Teorema de Bayes , Biologia Computacional/métodos , Humanos
20.
Philos Trans A Math Phys Eng Sci ; 376(2128)2018 Sep 13.
Artigo em Inglês | MEDLINE | ID: mdl-30082302

RESUMO

Concerns over reproducibility in science extend to research using existing healthcare data; many observational studies investigating the same topic produce conflicting results, even when using the same data. To address this problem, we propose a paradigm shift. The current paradigm centres on generating one estimate at a time using a unique study design with unknown reliability and publishing (or not) one estimate at a time. The new paradigm advocates for high-throughput observational studies using consistent and standardized methods, allowing evaluation, calibration and unbiased dissemination to generate a more reliable and complete evidence base. We demonstrate this new paradigm by comparing all depression treatments for a set of outcomes, producing 17 718 hazard ratios, each using methodology on par with current best practice. We furthermore include control hypotheses to evaluate and calibrate our evidence generation process. Results show good transitivity and consistency between databases, and agree with four out of the five findings from clinical trials. The distribution of effect size estimates reported in the literature reveals an absence of small or null effects, with a sharp cut-off at p = 0.05. No such phenomena were observed in our results, suggesting more complete and more reliable evidence.This article is part of a discussion meeting issue 'The growing ubiquity of algorithms in society: implications, impacts and innovations'.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA