Búsqueda | Portal Regional de la BVS

1.

Comparing penalization methods for linear models on large observational health data.

Fridgeirsson, Egill A; Williams, Ross; Rijnbeek, Peter; Suchard, Marc A; Reps, Jenna M.

J Am Med Inform Assoc ; 2024 May 20.

Artículo en Inglés | MEDLINE | ID: mdl-38767857

RESUMEN

OBJECTIVE: This study evaluates regularization variants in logistic regression (L1, L2, ElasticNet, Adaptive L1, Adaptive ElasticNet, Broken adaptive ridge [BAR], and Iterative hard thresholding [IHT]) for discrimination and calibration performance, focusing on both internal and external validation. MATERIALS AND METHODS: We use data from 5 US claims and electronic health record databases and develop models for various outcomes in a major depressive disorder patient population. We externally validate all models in the other databases. We use a train-test split of 75%/25% and evaluate performance with discrimination and calibration. Statistical analysis for difference in performance uses Friedman's test and critical difference diagrams. RESULTS: Of the 840 models we develop, L1 and ElasticNet emerge as superior in both internal and external discrimination, with a notable AUC difference. BAR and IHT show the best internal calibration, without a clear external calibration leader. ElasticNet typically has larger model sizes than L1. Methods like IHT and BAR, while slightly less discriminative, significantly reduce model complexity. CONCLUSION: L1 and ElasticNet offer the best discriminative performance in logistic regression for healthcare predictions, maintaining robustness across validations. For simpler, more interpretable models, L0-based methods (IHT and BAR) are advantageous, providing greater parsimony and calibration with fewer features. This study aids in selecting suitable regularization techniques for healthcare prediction models, balancing performance, complexity, and interpretability.

2.

Assessing Covariate Balance with Small Sample Sizes.

Hripcsak, George; Zhang, Linying; Li, Kelly; Suchard, Marc A; Ryan, Patrick B; Schuemie, Martijn J.

medRxiv ; 2024 Apr 24.

Artículo en Inglés | MEDLINE | ID: mdl-38712282

RESUMEN

Propensity score adjustment addresses confounding by balancing covariates in subject treatment groups through matching, stratification, inverse probability weighting, etc. Diagnostics ensure that the adjustment has been effective. A common technique is to check whether the standardized mean difference for each relevant covariate is less than a threshold like 0.1. For small sample sizes, the probability of falsely rejecting the validity of a study because of chance imbalance when no underlying balance exists approaches 1. We propose an alternative diagnostic that checks whether the standardized mean difference statistically significantly exceeds the threshold. Through simulation and real-world data, we find that this diagnostic achieves a better trade-off of type 1 error rate and power than standard nominal threshold tests and not testing for sample sizes from 250 to 4000 and for 20 to 100,000 covariates. In network studies, meta-analysis of effect estimates must be accompanied by meta-analysis of the diagnostics or else systematic confounding may overwhelm the estimated effect. Our procedure for statistically testing balance at both the database level and the meta-analysis level achieves the best balance of type-1 error rate and power. Our procedure supports the review of large numbers of covariates, enabling more rigorous diagnostics.

3.

Random-effects substitution models for phylogenetics via scalable gradient approximations.

Magee, Andrew F; Holbrook, Andrew J; Pekar, Jonathan E; Caviedes-Solis, Itzue W; Matsen Iv, Fredrick A; Baele, Guy; Wertheim, Joel O; Ji, Xiang; Lemey, Philippe; Suchard, Marc A.

Syst Biol ; 2024 May 07.

Artículo en Inglés | MEDLINE | ID: mdl-38712512

RESUMEN

Phylogenetic and discrete-trait evolutionary inference depend heavily on an appropriate characterization of the underlying character substitution process. In this paper, we present random-effects substitution models that extend common continuous-time Markov chain models into a richer class of processes capable of capturing a wider variety of substitution dynamics. As these random-effects substitution models often require many more parameters than their usual counterparts, inference can be both statistically and computationally challenging. Thus, we also propose an efficient approach to compute an approximation to the gradient of the data likelihood with respect to all unknown substitution model parameters. We demonstrate that this approximate gradient enables scaling of sampling-based inference, namely Bayesian inference via Hamiltonian Monte Carlo, under random-effects substitution models across large trees and state-spaces. Applied to a dataset of 583 SARS-CoV-2 sequences, an HKY model with random-effects shows strong signals of nonreversibility in the substitution process, and posterior predictive model checks clearly show that it is a more adequate model than a reversible model. When analyzing the pattern of phylogeographic spread of 1441 influenza A virus (H3N2) sequences between 14 regions, a random-effects phylogeographic substitution model infers that air travel volume adequately predicts almost all dispersal rates. A random-effects state-dependent substitution model reveals no evidence for an effect of arboreality on the swimming mode in the tree frog subfamily Hylinae. Simulations reveal that random-effects substitution models can accommodate both negligible and radical departures from the underlying base substitution model. We show that our gradient-based inference approach is over an order of magnitude more time efficient than conventional approaches.

4.

Massive Parallelization of Massive Sample-size Survival Analysis.

Yang, Jianxiao; Schuemie, Martijn J; Ji, Xiang; Suchard, Marc A.

J Comput Graph Stat ; 33(1): 289-302, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-38716090

RESUMEN

Large-scale observational health databases are increasingly popular for conducting comparative effectiveness and safety studies of medical products. However, increasing number of patients poses computational challenges when fitting survival regression models in such studies. In this paper, we use graphics processing units (GPUs) to parallelize the computational bottlenecks of massive sample-size survival analyses. Specifically, we develop and apply time- and memory-efficient single-pass parallel scan algorithms for Cox proportional hazards models and forward-backward parallel scan algorithms for Fine-Gray models for analysis with and without a competing risk using a cyclic coordinate descent optimization approach. We demonstrate that GPUs accelerate the computation of fitting these complex models in large databases by orders of magnitude as compared to traditional multi-core CPU parallelism. Our implementation enables efficient large-scale observational studies involving millions of patients and thousands of patient characteristics. The above implementation is available in the open-source R package Cyclops (Suchard et al., 2013).

5.

The genomic evolutionary dynamics and global circulation patterns of respiratory syncytial virus.

Langedijk, Annefleur C; Vrancken, Bram; Lebbink, Robert Jan; Wilkins, Deidre; Kelly, Elizabeth J; Baraldi, Eugenio; Mascareñas de Los Santos, Abiel Homero; Danilenko, Daria M; Choi, Eun Hwa; Palomino, María Angélica; Chi, Hsin; Keller, Christian; Cohen, Robert; Papenburg, Jesse; Pernica, Jeffrey; Greenough, Anne; Richmond, Peter; Martinón-Torres, Federico; Heikkinen, Terho; Stein, Renato T; Hosoya, Mitsuaki; Nunes, Marta C; Verwey, Charl; Evers, Anouk; Kragten-Tabatabaie, Leyla; Suchard, Marc A; Kosakovsky Pond, Sergei L; Poletto, Chiara; Colizza, Vittoria; Lemey, Philippe; Bont, Louis J.

Nat Commun ; 15(1): 3083, 2024 Apr 10.

Artículo en Inglés | MEDLINE | ID: mdl-38600104

RESUMEN

Respiratory syncytial virus (RSV) is a leading cause of acute lower respiratory tract infection in young children and the second leading cause of infant death worldwide. While global circulation has been extensively studied for respiratory viruses such as seasonal influenza, and more recently also in great detail for SARS-CoV-2, a lack of global multi-annual sampling of complete RSV genomes limits our understanding of RSV molecular epidemiology. Here, we capitalise on the genomic surveillance by the INFORM-RSV study and apply phylodynamic approaches to uncover how selection and neutral epidemiological processes shape RSV diversity. Using complete viral genome sequences, we show similar patterns of site-specific diversifying selection among RSVA and RSVB and recover the imprint of non-neutral epidemic processes on their genealogies. Using a phylogeographic approach, we provide evidence for air travel governing the global patterns of RSVA and RSVB spread, which results in a considerable degree of phylogenetic mixing across countries. Our findings highlight the potential of systematic global RSV genomic surveillance for transforming our understanding of global RSV spread.

Asunto(s)

Infecciones por Virus Sincitial Respiratorio , Virus Sincitial Respiratorio Humano , Infecciones del Sistema Respiratorio , Lactante , Niño , Humanos , Preescolar , Infecciones por Virus Sincitial Respiratorio/epidemiología , Infecciones por Virus Sincitial Respiratorio/genética , Filogenia , Virus Sincitial Respiratorio Humano/genética , Genómica , Infecciones del Sistema Respiratorio/epidemiología

6.

How fast are viruses spreading in the wild?

Dellicour, Simon; Bastide, Paul; Rocu, Pauline; Fargette, Denis; Hardy, Olivier J; Suchard, Marc A; Guindon, Stéphane; Lemey, Philippe.

bioRxiv ; 2024 Apr 11.

Artículo en Inglés | MEDLINE | ID: mdl-38645268

RESUMEN

Genomic data collected from viral outbreaks can be exploited to reconstruct the dispersal history of viral lineages in a two-dimensional space using continuous phylogeographic inference. These spatially explicit reconstructions can subsequently be used to estimate dispersal metrics allowing to unveil the dispersal dynamics and evaluate the capacity to spread among hosts. Heterogeneous sampling intensity of genomic sequences can however impact the accuracy of dispersal insights gained through phylogeographic inference. In our study, we implement a simulation framework to evaluate the robustness of three dispersal metrics - a lineage dispersal velocity, a diffusion coefficient, and an isolation-by-distance signal metric - to the sampling effort. Our results reveal that both the diffusion coefficient and isolation-by-distance signal metrics appear to be robust to the number of samples considered for the phylogeographic reconstruction. We then use these two dispersal metrics to compare the dispersal pattern and capacity of various viruses spreading in animal populations. Our comparative analysis reveals a broad range of isolation-by-distance patterns and diffusion coefficients mostly reflecting the dispersal capacity of the main infected host species but also, in some cases, the likely signature of rapid and/or long-distance dispersal events driven by human-mediated movements through animal trade. Overall, our study provides key recommendations for the lineage dispersal metrics to consider in future studies and illustrates their application to compare the spread of viruses in various settings.

7.

Integrating dynamical modeling and phylogeographic inference to characterize global influenza circulation.

Parino, Francesco; Gustani-Buss, Emanuele; Bedford, Trevor; Suchard, Marc A; Trovão, Nídia Sequeira; Rambaut, Andrew; Colizza, Vittoria; Poletto, Chiara; Lemey, Philippe.

medRxiv ; 2024 Mar 15.

Artículo en Inglés | MEDLINE | ID: mdl-38559244

RESUMEN

Global seasonal influenza circulation involves a complex interplay between local (seasonality, demography, host immunity) and global factors (international mobility) shaping recurrent epidemic patterns. No studies so far have reconciled the two spatial levels, evaluating the coupling between national epidemics, considering heterogeneous coverage of epidemiological and virological data, integrating different data sources. We propose a novel combined approach based on a dynamical model of global influenza spread (GLEAM), integrating high-resolution demographic and mobility data, and a generalized linear model of phylogeographic diffusion that accounts for time-varying migration rates. Seasonal migration fluxes across global macro-regions simulated with GLEAM are tested as phylogeographic predictors to provide model validation and calibration based on genetic data. Seasonal fluxes obtained with a specific transmissibility peak time and recurrent travel outperformed the raw air-transportation predictor, previously considered as optimal indicator of global influenza migration. Influenza A subtypes supported autumn-winter reproductive number as high as 2.25 and an average immunity duration of 2 years. Similar dynamics were preferred by influenza B lineages, with a lower autumn-winter reproductive number. Comparing simulated epidemic profiles against FluNet data offered comparatively limited resolution power. The multiscale approach enables model selection yielding a novel computational framework for describing global influenza dynamics at different scales - local transmission and national epidemics vs. international coupling through mobility and imported cases. Our findings have important implications to improve preparedness against seasonal influenza epidemics. The approach can be generalized to other epidemic contexts, such as emerging disease outbreaks to improve the flexibility and predictive power of modeling.

8.

Comparative safety and effectiveness of angiotensin converting enzyme inhibitors and thiazides and thiazide-like diuretics under strict monotherapy.

Anand, Tara V; Bu, Fan; Schuemie, Martijn J; Suchard, Marc A; Hripcsak, George.

J Clin Hypertens (Greenwich) ; 26(4): 425-430, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38501749

RESUMEN

Previous work comparing safety and effectiveness outcomes for new initiators of angiotensin converting-enzyme inhibitors (ACEi) and thiazides demonstrated more favorable outcomes for thiazides, although cohort definitions allowed for addition of a second antihypertensive medication after a week of monotherapy. Here, we modify the monotherapy definition, imposing exit from cohorts upon addition of another antihypertensive medication. We determine hazard ratios (HR) for 55 safety and effectiveness outcomes over six databases and compare results to earlier findings. We find, for all primary outcomes, statistically significant differences in effectiveness between ACEi and thiazides were not replicated (HRs: 1.11, 1.06, 1.12 for acute myocardial infarction, hospitalization with heart failure and stroke, respectively). While statistical significance is similarly lost for several safety outcomes, the safety profile of thiazides remains more favorable. Our results indicate a less striking difference in effectiveness of thiazides compared to ACEi and reflect some sensitivity to the monotherapy cohort definition modification.

Asunto(s)

Inhibidores de la Enzima Convertidora de Angiotensina , Hipertensión , Humanos , Inhibidores de la Enzima Convertidora de Angiotensina/efectos adversos , Antihipertensivos/efectos adversos , Diuréticos/efectos adversos , Hipertensión/tratamiento farmacológico , Inhibidores de los Simportadores del Cloruro de Sodio/efectos adversos , Tiazidas/efectos adversos

9.

Scalable gradients enable Hamiltonian Monte Carlo sampling for phylodynamic inference under episodic birth-death-sampling models.

Shao, Yucai; Magee, Andrew F; Vasylyeva, Tetyana I; Suchard, Marc A.

PLoS Comput Biol ; 20(3): e1011640, 2024 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-38551979

RESUMEN

Birth-death models play a key role in phylodynamic analysis for their interpretation in terms of key epidemiological parameters. In particular, models with piecewise-constant rates varying at different epochs in time, to which we refer as episodic birth-death-sampling (EBDS) models, are valuable for their reflection of changing transmission dynamics over time. A challenge, however, that persists with current time-varying model inference procedures is their lack of computational efficiency. This limitation hinders the full utilization of these models in large-scale phylodynamic analyses, especially when dealing with high-dimensional parameter vectors that exhibit strong correlations. We present here a linear-time algorithm to compute the gradient of the birth-death model sampling density with respect to all time-varying parameters, and we implement this algorithm within a gradient-based Hamiltonian Monte Carlo (HMC) sampler to alleviate the computational burden of conducting inference under a wide variety of structures of, as well as priors for, EBDS processes. We assess this approach using three different real world data examples, including the HIV epidemic in Odesa, Ukraine, seasonal influenza A/H3N2 virus dynamics in New York state, America, and Ebola outbreak in West Africa. HMC sampling exhibits a substantial efficiency boost, delivering a 10- to 200-fold increase in minimum effective sample size per unit-time, in comparison to a Metropolis-Hastings-based approach. Additionally, we show the robustness of our implementation in both allowing for flexible prior choices and in modeling the transmission dynamics of various pathogens by accurately capturing the changing trend of viral effective reproductive number.

Asunto(s)

Epidemias , Fiebre Hemorrágica Ebola , Gripe Humana , Humanos , Subtipo H3N2 del Virus de la Influenza A , Algoritmos , Gripe Humana/epidemiología , Fiebre Hemorrágica Ebola/epidemiología , Método de Montecarlo

10.

Similar Risk of Kidney Failure among Patients with Blinding Diseases Who Receive Ranibizumab, Aflibercept, and Bevacizumab: An Observational Health Data Sciences and Informatics Network Study.

Cai, Cindy X; Nishimura, Akihiko; Bowring, Mary G; Westlund, Erik; Tran, Diep; Ng, Jia H; Nagy, Paul; Cook, Michael; McLeggon, Jody-Ann; DuVall, Scott L; Matheny, Michael E; Golozar, Asieh; Ostropolets, Anna; Minty, Evan; Desai, Priya; Bu, Fan; Toy, Brian; Hribar, Michelle; Falconer, Thomas; Zhang, Linying; Lawrence-Archer, Laurence; Boland, Michael V; Goetz, Kerry; Hall, Nathan; Shoaibi, Azza; Reps, Jenna; Sena, Anthony G; Blacketer, Clair; Swerdel, Joel; Jhaveri, Kenar D; Lee, Edward; Gilbert, Zachary; Zeger, Scott L; Crews, Deidra C; Suchard, Marc A; Hripcsak, George; Ryan, Patrick B.

Ophthalmol Retina ; 2024 Mar 20.

Artículo en Inglés | MEDLINE | ID: mdl-38519026

RESUMEN

PURPOSE: To characterize the incidence of kidney failure associated with intravitreal anti-VEGF exposure; and compare the risk of kidney failure in patients treated with ranibizumab, aflibercept, or bevacizumab. DESIGN: Retrospective cohort study across 12 databases in the Observational Health Data Sciences and Informatics (OHDSI) network. SUBJECTS: Subjects aged ≥ 18 years with ≥ 3 monthly intravitreal anti-VEGF medications for a blinding disease (diabetic retinopathy, diabetic macular edema, exudative age-related macular degeneration, or retinal vein occlusion). METHODS: The standardized incidence proportions and rates of kidney failure while on treatment with anti-VEGF were calculated. For each comparison (e.g., aflibercept versus ranibizumab), patients from each group were matched 1:1 using propensity scores. Cox proportional hazards models were used to estimate the risk of kidney failure while on treatment. A random effects meta-analysis was performed to combine each database's hazard ratio (HR) estimate into a single network-wide estimate. MAIN OUTCOME MEASURES: Incidence of kidney failure while on anti-VEGF treatment, and time from cohort entry to kidney failure. RESULTS: Of the 6.1 million patients with blinding diseases, 37 189 who received ranibizumab, 39 447 aflibercept, and 163 611 bevacizumab were included; the total treatment exposure time was 161 724 person-years. The average standardized incidence proportion of kidney failure was 678 per 100 000 persons (range, 0-2389), and incidence rate 742 per 100 000 person-years (range, 0-2661). The meta-analysis HR of kidney failure comparing aflibercept with ranibizumab was 1.01 (95% confidence interval [CI], 0.70-1.47; P = 0.45), ranibizumab with bevacizumab 0.95 (95% CI, 0.68-1.32; P = 0.62), and aflibercept with bevacizumab 0.95 (95% CI, 0.65-1.39; P = 0.60). CONCLUSIONS: There was no substantially different relative risk of kidney failure between those who received ranibizumab, bevacizumab, or aflibercept. Practicing ophthalmologists and nephrologists should be aware of the risk of kidney failure among patients receiving intravitreal anti-VEGF medications and that there is little empirical evidence to preferentially choose among the specific intravitreal anti-VEGF agents. FINANCIAL DISCLOSURES: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

11.

Dispersal history of SARS-CoV-2 in Galicia, Spain.

Gallego-García, Pilar; Estévez-Gómez, Nuria; De Chiara, Loretta; Alvariño, Pilar; Juiz-González, Pedro M; Torres-Beceiro, Isabel; Poza, Margarita; Vallejo, Juan A; Rumbo-Feal, Soraya; Conde-Pérez, Kelly; Aja-Macaya, Pablo; Ladra, Susana; Moreno-Flores, Antonio; Gude-González, María J; Coira, Amparo; Aguilera, Antonio; Costa-Alcalde, José J; Trastoy, Rocío; Barbeito-Castiñeiras, Gema; García-Souto, Daniel; Tubio, José M C; Trigo-Daporta, Matilde; Camacho-Zamora, Pablo; Costa, Juan García; González-Domínguez, María; Canoura-Fernández, Luis; Glez-Peña, Daniel; Pérez-Castro, Sonia; Cabrera, Jorge J; Daviña-Núñez, Carlos; Godoy-Diz, Montserrat; Treinta-Álvarez, Ana Belén; Veiga, Maria Isabel; Sousa, João Carlos; Osório, Nuno S; Comas, Iñaki; González-Candelas, Fernando; Hong, Samuel L; Bollen, Nena; Dellicour, Simon; Baele, Guy; Suchard, Marc A; Lemey, Philippe; Agulla, Andrés; Bou, Germán; Alonso-García, Pilar; Pérez-Del-Molino, María Luisa; García-Campello, Marta; Paz-Vidal, Isabel; Regueiro, Benito.

medRxiv ; 2024 Feb 28.

Artículo en Inglés | MEDLINE | ID: mdl-38463998

RESUMEN

The dynamics of SARS-CoV-2 transmission are influenced by a variety of factors, including social restrictions and the emergence of distinct variants. In this study, we delve into the origins and dissemination of the Alpha, Delta, and Omicron variants of concern in Galicia, northwest Spain. For this, we leveraged genomic data collected by the EPICOVIGAL Consortium and from the GISAID database, along with mobility information from other Spanish regions and foreign countries. Our analysis indicates that initial introductions during the Alpha phase were predominantly from other Spanish regions and France. However, as the pandemic progressed, introductions from Portugal and the USA became increasingly significant. Notably, Galicia's major coastal cities emerged as critical hubs for viral transmission, highlighting their role in sustaining and spreading the virus. This research emphasizes the critical role of regional connectivity in the spread of SARS-CoV-2 and offers essential insights for enhancing public health strategies and surveillance measures.

12.

Authors' Response to Huang et al.'s Comment on "Serially Combining Epidemiological Designs Does Not Improve Overall Signal Detection in Vaccine Safety Surveillance".

Bu, Fan; Arshad, Faaizah; Hripcsak, George; Ryan, Patrick B; Schuemie, Martijn J; Suchard, Marc A.

Drug Saf ; 47(4): 403-404, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38441750

Asunto(s)

Vacunas , Humanos , Vacunas/efectos adversos

13.

On the surprising effectiveness of a simple matrix exponential derivative approximation, with application to global SARS-CoV-2.

Didier, Gustavo; Glatt-Holtz, Nathan E; Holbrook, Andrew J; Magee, Andrew F; Suchard, Marc A.

Proc Natl Acad Sci U S A ; 121(3): e2318989121, 2024 Jan 16.

Artículo en Inglés | MEDLINE | ID: mdl-38215186

RESUMEN

The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradient-based methods requires the derivative of the matrix exponential evaluated at the CTMC's infinitesimal generator (rate) matrix. Motivated by the derivative's extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a "blessing of dimensionality" result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory Hamiltonian Monte Carlo for the analysis of the early spread of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.

Asunto(s)

COVID-19 , SARS-CoV-2 , Humanos , Algoritmos , COVID-19/epidemiología , Cadenas de Markov

14.

Many-core algorithms for high-dimensional gradients on phylogenetic trees.

Gangavarapu, Karthik; Ji, Xiang; Baele, Guy; Fourment, Mathieu; Lemey, Philippe; Matsen, Frederick A; Suchard, Marc A.

Bioinformatics ; 40(2)2024 02 01.

Artículo en Inglés | MEDLINE | ID: mdl-38243701

RESUMEN

MOTIVATION: Advancements in high-throughput genomic sequencing are delivering genomic pathogen data at an unprecedented rate, positioning statistical phylogenetics as a critical tool to monitor infectious diseases globally. This rapid growth spurs the need for efficient inference techniques, such as Hamiltonian Monte Carlo (HMC) in a Bayesian framework, to estimate parameters of these phylogenetic models where the dimensions of the parameters increase with the number of sequences N. HMC requires repeated calculation of the gradient of the data log-likelihood with respect to (wrt) all branch-length-specific (BLS) parameters that traditionally takes O(N2) operations using the standard pruning algorithm. A recent study proposes an approach to calculate this gradient in O(N), enabling researchers to take advantage of gradient-based samplers such as HMC. The CPU implementation of this approach makes the calculation of the gradient computationally tractable for nucleotide-based models but falls short in performance for larger state-space size models, such as Markov-modulated and codon models. Here, we describe novel massively parallel algorithms to calculate the gradient of the log-likelihood wrt all BLS parameters that take advantage of graphics processing units (GPUs) and result in many fold higher speedups over previous CPU implementations. RESULTS: We benchmark these GPU algorithms on three computing systems using three evolutionary inference examples exploring complete genomes from 997 dengue viruses, 62 carnivore mitochondria and 49 yeasts, and observe a >128-fold speedup over the CPU implementation for codon-based models and >8-fold speedup for nucleotide-based models. As a practical demonstration, we also estimate the timing of the first introduction of West Nile virus into the continental Unites States under a codon model with a relaxed molecular clock from 104 full viral genomes, an inference task previously intractable. AVAILABILITY AND IMPLEMENTATION: We provide an implementation of our GPU algorithms in BEAGLE v4.0.0 (https://github.com/beagle-dev/beagle-lib), an open-source library for statistical phylogenetics that enables parallel calculations on multi-core CPUs and GPUs. We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (https://github.com/beast-dev/beast-mcmc).

Asunto(s)

Algoritmos , Programas Informáticos , Filogenia , Teorema de Bayes , Codón , Nucleótidos

15.

Bayesian safety surveillance with adaptive bias correction.

Bu, Fan; Schuemie, Martijn J; Nishimura, Akihiko; Smith, Louisa H; Kostka, Kristin; Falconer, Thomas; McLeggon, Jody-Ann; Ryan, Patrick B; Hripcsak, George; Suchard, Marc A.

Stat Med ; 43(2): 395-418, 2024 01 30.

Artículo en Inglés | MEDLINE | ID: mdl-38010062

RESUMEN

Postmarket safety surveillance is an integral part of mass vaccination programs. Typically relying on sequential analysis of real-world health data as they accrue, safety surveillance is challenged by sequential multiple testing and by biases induced by residual confounding in observational data. The current standard approach based on the maximized sequential probability ratio test (MaxSPRT) fails to satisfactorily address these practical challenges and it remains a rigid framework that requires prespecification of the surveillance schedule. We develop an alternative Bayesian surveillance procedure that addresses both aforementioned challenges using a more flexible framework. To mitigate bias, we jointly analyze a large set of negative control outcomes that are adverse events with no known association with the vaccines in order to inform an empirical bias distribution, which we then incorporate into estimating the effect of vaccine exposure on the adverse event of interest through a Bayesian hierarchical model. To address multiple testing and improve on flexibility, at each analysis timepoint, we update a posterior probability in favor of the alternative hypothesis that vaccination induces higher risks of adverse events, and then use it for sequential detection of safety signals. Through an empirical evaluation using six US observational healthcare databases covering more than 360 million patients, we benchmark the proposed procedure against MaxSPRT on testing errors and estimation accuracy, under two epidemiological designs, the historical comparator and the self-controlled case series. We demonstrate that our procedure substantially reduces Type 1 error rates, maintains high statistical power and fast signal detection, and provides considerably more accurate estimation than MaxSPRT. Given the extensiveness of the empirical study which yields more than 7 million sets of results, we present all results in a public R ShinyApp. As an effort to promote open science, we provide full implementation of our method in the open-source R package EvidenceSynthesis.

Asunto(s)

Sistemas de Registro de Reacción Adversa a Medicamentos , Vigilancia de Productos Comercializados , Vacunas , Humanos , Teorema de Bayes , Sesgo , Probabilidad , Vacunas/efectos adversos

16.

Risk assessment of SARS-CoV-2 replicating and evolving in animals.

Zhao, Jin; Kang, Mei; Wu, Hongyan; Sun, Bowen; Baele, Guy; He, Wan-Ting; Lu, Meng; Suchard, Marc A; Ji, Xiang; He, Na; Su, Shuo; Veit, Michael.

Trends Microbiol ; 32(1): 79-92, 2024 01.

Artículo en Inglés | MEDLINE | ID: mdl-37541811

RESUMEN

The retransmissions of SARS-CoV-2 from several mammals - primarily mink and white-tailed deer - to humans have raised concerns for the emergence of a new animal-derived SARS-CoV-2 variant to worsen the pandemic. Here, we discuss animal species that are susceptible to natural or experimental infection with SARS-CoV-2 and can transmit the virus to mates or humans. We describe cutting-edge techniques to assess the impact of a mutation in the viral spike (S) protein on its receptor and on antibody binding. Our review of spike sequences of animal-derived viruses identified nine unique amino acid exchanges in the receptor-binding domain (RBD) that are not present in any variant of concern (VOC). These mutations are present in SARS-CoV-2 found in companion animals such as dogs and cats, and they exhibit a higher frequency in SARS-CoV-2 found in mink and white-tailed deer, suggesting that sustained transmissions may contribute to maintaining novel mutations. Four of these exchanges, such as Leu452Met, could undermine acquired immune protection in humans while maintaining high affinity for the human angiotensin-converting enzyme 2 (ACE2) receptor. Finally, we discuss important avenues of future research into animal-derived viruses with public health risks.

Asunto(s)

COVID-19 , Enfermedades de los Gatos , Ciervos , Enfermedades de los Perros , Animales , Perros , Gatos , Humanos , SARS-CoV-2/genética , Ciervos/metabolismo , Visón/metabolismo , Medición de Riesgo , Glicoproteína de la Espiga del Coronavirus/genética , Mutación , Unión Proteica

17.

Evaluating the impact of alternative phenotype definitions on incidence rates across a global data network.

Makadia, Rupa; Shoaibi, Azza; Rao, Gowtham A; Ostropolets, Anna; Rijnbeek, Peter R; Voss, Erica A; Duarte-Salles, Talita; Ramírez-Anguita, Juan Manuel; Mayer, Miguel A; Maljkovic, Filip; Denaxas, Spiros; Nyberg, Fredrik; Papez, Vaclav; Sena, Anthony G; Alshammari, Thamir M; Lai, Lana Y H; Haynes, Kevin; Suchard, Marc A; Hripcsak, George; Ryan, Patrick B.

JAMIA Open ; 6(4): ooad096, 2023 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-38028730

RESUMEN

Objective: Developing accurate phenotype definitions is critical in obtaining reliable and reproducible background rates in safety research. This study aims to illustrate the differences in background incidence rates by comparing definitions for a given outcome. Materials and Methods: We used 16 data sources to systematically generate and evaluate outcomes for 13 adverse events and their overall background rates. We examined the effect of different modifications (inpatient setting, standardization of code set, and code set changes) to the computable phenotype on background incidence rates. Results: Rate ratios (RRs) of the incidence rates from each computable phenotype definition varied across outcomes, with inpatient restriction showing the highest variation from 1 to 11.93. Standardization of code set RRs ranges from 1 to 1.64, and code set changes range from 1 to 2.52. Discussion: The modification that has the highest impact is requiring inpatient place of service, leading to at least a 2-fold higher incidence rate in the base definition. Standardization showed almost no change when using source code variations. The strength of the effect in the inpatient restriction is highly dependent on the outcome. Changing definitions from broad to narrow showed the most variability by age/gender/database across phenotypes and less than a 2-fold increase in rate compared to the base definition. Conclusion: Characterization of outcomes across a network of databases yields insights into sensitivity and specificity trade-offs when definitions are altered. Outcomes should be thoroughly evaluated prior to use for background rates for their plausibility for use across a global network.

18.

An explainable machine learning-based phenomapping strategy for adaptive predictive enrichment in randomized clinical trials.

Oikonomou, Evangelos K; Thangaraj, Phyllis M; Bhatt, Deepak L; Ross, Joseph S; Young, Lawrence H; Krumholz, Harlan M; Suchard, Marc A; Khera, Rohan.

NPJ Digit Med ; 6(1): 217, 2023 Nov 25.

Artículo en Inglés | MEDLINE | ID: mdl-38001154

RESUMEN

Randomized clinical trials (RCT) represent the cornerstone of evidence-based medicine but are resource-intensive. We propose and evaluate a machine learning (ML) strategy of adaptive predictive enrichment through computational trial phenomaps to optimize RCT enrollment. In simulated group sequential analyses of two large cardiovascular outcomes RCTs of (1) a therapeutic drug (pioglitazone versus placebo; Insulin Resistance Intervention after Stroke (IRIS) trial), and (2) a disease management strategy (intensive versus standard systolic blood pressure reduction in the Systolic Blood Pressure Intervention Trial (SPRINT)), we constructed dynamic phenotypic representations to infer response profiles during interim analyses and examined their association with study outcomes. Across three interim timepoints, our strategy learned dynamic phenotypic signatures predictive of individualized cardiovascular benefit. By conditioning a prospective candidate's probability of enrollment on their predicted benefit, we estimate that our approach would have enabled a reduction in the final trial size across ten simulations (IRIS: -14.8% ± 3.1%, pone-sample t-test = 0.001; SPRINT: -17.6% ± 3.6%, pone-sample t-test < 0.001), while preserving the original average treatment effect (IRIS: hazard ratio of 0.73 ± 0.01 for pioglitazone vs placebo, vs 0.76 in the original trial; SPRINT: hazard ratio of 0.72 ± 0.01 for intensive vs standard systolic blood pressure, vs 0.75 in the original trial; all simulations with Cox regression-derived p value of < 0.01 for the effect of the intervention on the respective primary outcome). This adaptive framework has the potential to maximize RCT enrollment efficiency.

19.

Shrinkage-based Random Local Clocks with Scalable Inference.

Fisher, Alexander A; Ji, Xiang; Nishimura, Akihiko; Baele, Guy; Lemey, Philippe; Suchard, Marc A.

Mol Biol Evol ; 40(11)2023 Nov 03.

Artículo en Inglés | MEDLINE | ID: mdl-37950885

RESUMEN

Molecular clock models undergird modern methods of divergence-time estimation. Local clock models propose that the rate of molecular evolution is constant within phylogenetic subtrees. Current local clock inference procedures exhibit one or more weaknesses, namely they achieve limited scalability to trees with large numbers of taxa, impose model misspecification, or require a priori knowledge of the existence and location of clocks. To overcome these challenges, we present an autocorrelated, Bayesian model of heritable clock rate evolution that leverages heavy-tailed priors with mean zero to shrink increments of change between branch-specific clocks. We further develop an efficient Hamiltonian Monte Carlo sampler that exploits closed form gradient computations to scale our model to large trees. Inference under our shrinkage clock exhibits a speed-up compared to the popular random local clock when estimating branch-specific clock rates on a variety of simulated datasets. This speed-up increases with the size of the problem. We further show our shrinkage clock recovers known local clocks within a rodent and mammalian phylogeny. Finally, in a problem that once appeared computationally impractical, we investigate the heritable clock structure of various surface glycoproteins of influenza A virus in the absence of prior knowledge about clock placement. We implement our shrinkage clock and make it publicly available in the BEAST software package.

Asunto(s)

Evolución Molecular , Mamíferos , Animales , Filogenia , Teorema de Bayes , Factores de Tiempo , Modelos Genéticos

20.

APOBEC3 deaminase editing in mpox virus as evidence for sustained human transmission since at least 2016.

O'Toole, Áine; Neher, Richard A; Ndodo, Nnaemeka; Borges, Vitor; Gannon, Ben; Gomes, João Paulo; Groves, Natalie; King, David J; Maloney, Daniel; Lemey, Philippe; Lewandowski, Kuiama; Loman, Nicholas; Myers, Richard; Omah, Ifeanyi F; Suchard, Marc A; Worobey, Michael; Chand, Meera; Ihekweazu, Chikwe; Ulaeto, David; Adetifa, Ifedayo; Rambaut, Andrew.

Science ; 382(6670): 595-600, 2023 11 03.

Artículo en Inglés | MEDLINE | ID: mdl-37917680

RESUMEN

Historically, mpox has been characterized as an endemic zoonotic disease that transmits through contact with the reservoir rodent host in West and Central Africa. However, in May 2022, human cases of mpox were detected spreading internationally beyond countries with known endemic reservoirs. When the first cases from 2022 were sequenced, they shared 42 nucleotide differences from the closest mpox virus (MPXV) previously sampled. Nearly all these mutations are characteristic of the action of APOBEC3 deaminases, host enzymes with antiviral function. Assuming APOBEC3 editing is characteristic of human MPXV infection, we developed a dual-process phylogenetic molecular clock that-inferring a rate of ~6 APOBEC3 mutations per year-estimates that MPXV has been circulating in humans since 2016. These observations of sustained MPXV transmission present a fundamental shift to the perceived paradigm of MPXV epidemiology as a zoonosis and highlight the need for revising public health messaging around MPXV as well as outbreak management and control.

Asunto(s)

Desaminasas APOBEC , Monkeypox virus , Mpox , Edición de ARN , Zoonosis Virales , Animales , Humanos , África Central/epidemiología , África Occidental/epidemiología , Desaminasas APOBEC/genética , Brotes de Enfermedades , Mpox/epidemiología , Mpox/genética , Mpox/transmisión , Monkeypox virus/genética , Monkeypox virus/metabolismo , Mutación , Filogenia , Zoonosis Virales/genética , Zoonosis Virales/transmisión

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA