Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 84
Filtrar
Más filtros

Base de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 40(Supplement_1): i39-i47, 2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-38940175

RESUMEN

MOTIVATION: World Health Organization estimates that there were over 10 million cases of tuberculosis (TB) worldwide in 2019, resulting in over 1.4 million deaths, with a worrisome increasing trend yearly. The disease is caused by Mycobacterium tuberculosis (MTB) through airborne transmission. Treatment of TB is estimated to be 85% successful, however, this drops to 57% if MTB exhibits multiple antimicrobial resistance (AMR), for which fewer treatment options are available. RESULTS: We develop a robust machine-learning classifier using both linear and nonlinear models (i.e. LASSO logistic regression (LR) and random forests (RF)) to predict the phenotypic resistance of Mycobacterium tuberculosis (MTB) for a broad range of antibiotic drugs. We use data from the CRyPTIC consortium to train our classifier, which consists of whole genome sequencing and antibiotic susceptibility testing (AST) phenotypic data for 13 different antibiotics. To train our model, we assemble the sequence data into genomic contigs, identify all unique 31-mers in the set of contigs, and build a feature matrix M, where M[i, j] is equal to the number of times the ith 31-mer occurs in the jth genome. Due to the size of this feature matrix (over 350 million unique 31-mers), we build and use a sparse matrix representation. Our method, which we refer to as MTB++, leverages compact data structures and iterative methods to allow for the screening of all the 31-mers in the development of both LASSO LR and RF. MTB++ is able to achieve high discrimination (F-1 >80%) for the first-line antibiotics. Moreover, MTB++ had the highest F-1 score in all but three classes and was the most comprehensive since it had an F-1 score >75% in all but four (rare) antibiotic drugs. We use our feature selection to contextualize the 31-mers that are used for the prediction of phenotypic resistance, leading to some insights about sequence similarity to genes in MEGARes. Lastly, we give an estimate of the amount of data that is needed in order to provide accurate predictions. AVAILABILITY: The models and source code are publicly available on Github at https://github.com/M-Serajian/MTB-Pipeline.


Asunto(s)
Aprendizaje Automático , Mycobacterium tuberculosis , Mycobacterium tuberculosis/genética , Mycobacterium tuberculosis/efectos de los fármacos , Farmacorresistencia Bacteriana/genética , Pruebas de Sensibilidad Microbiana , Antibacterianos/farmacología , Secuenciación Completa del Genoma/métodos , Genoma Bacteriano , Humanos
2.
Life (Basel) ; 14(6)2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38929660

RESUMEN

Life on our planet likely evolved in the ocean, and thus exo-oceans are key habitats to search for extraterrestrial life. We conducted a data-driven bibliographic survey on the astrobiology literature to identify emerging research trends with marine science for future synergies in the exploration for extraterrestrial life in exo-oceans. Based on search queries, we identified 2592 published items since 1963. The current literature falls into three major groups of terms focusing on (1) the search for life on Mars, (2) astrobiology within our Solar System with reference to icy moons and their exo-oceans, and (3) astronomical and biological parameters for planetary habitability. We also identified that the most prominent research keywords form three key-groups focusing on (1) using terrestrial environments as proxies for Martian environments, centred on extremophiles and biosignatures, (2) habitable zones outside of "Goldilocks" orbital ranges, centred on ice planets, and (3) the atmosphere, magnetic field, and geology in relation to planets' habitable conditions, centred on water-based oceans.

3.
bioRxiv ; 2024 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-38559026

RESUMEN

Portable genomic sequencers such as Oxford Nanopore's MinION enable real-time applications in both clinical and environmental health, e.g., detection of bacterial outbreaks. However, there is a bottleneck in the downstream analytics when bioinformatics pipelines are unavailable, e.g., when cloud processing is unreachable due to absence of Internet connection, or only low-end computing devices can be carried on site. For instance, metagenomics classifiers usually require a large amount of memory or specific operating systems/libraries. In this work, we present a platform-friendly software for portable metagenomic analysis of Nanopore data, the Oligomer-based Classifier of Taxonomic Operational and Pan-genome Units via Singletons (OCTOPUS). OCTOPUS is written in Java, reimplements several features of the popular Kraken2 and KrakenUniq software, with original components for improving metagenomics classification on incomplete/sampled reference databases (e.g., selection of bacteria of public health priority), making it ideal for running on smartphones or tablets. We indexed both OCTOPUS and Kraken2 on a bacterial database with ~4,000 reference genomes, then simulated a positive (bacterial genomes from the same species, but different genomes) and two negative (viral, mammalian) Nanopore test sets. On the bacterial test set OCTOPUS yielded sensitivity and precision comparable to Kraken2 (94.4% and 99.8% versus 94.5% and 99.1%, respectively). On non-bacterial sequences (mammals and viral), OCTOPUS dramatically decreased (4- to 16-fold) the false positive rate when compared to Kraken2 (2.1% and 0.7% versus 8.2% and 11.2%, respectively). We also developed customized databases including viruses, and the World Health Organization's set of bacteria of concern for drug resistance, tested with real Nanopore data on an Android smartphone. OCTOPUS is publicly available at https://github.com/DataIntellSystLab/OCTOPUS and https://github.com/Ruiz-HCI-Lab/OctopusMobile.

4.
J Transl Med ; 22(1): 269, 2024 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-38475767

RESUMEN

BACKGROUND: Chemotherapy is a primary treatment for cancer, but its efficacy is often limited by cancer-associated bacteria (CAB) that impair tumor suppressor functions. Our previous research found that Mycoplasma fermentans DnaK, a chaperone protein, impairs p53 activities, which are essential for most anti-cancer chemotherapeutic responses. METHODS: To investigate the role of DnaK in chemotherapy, we treated cancer cell lines with M. fermentans DnaK and then with commonly used p53-dependent anti-cancer drugs (cisplatin and 5FU). We evaluated the cells' survival in the presence or absence of a DnaK-binding peptide (ARV-1502). We also validated our findings using primary tumor cells from a novel DnaK knock-in mouse model. To provide a broader context for the clinical significance of these findings, we investigated human primary cancer sequencing datasets from The Cancer Genome Atlas (TCGA). We identified F. nucleatum as a CAB carrying DnaK with an amino acid composition highly similar to M. fermentans DnaK. Therefore, we investigated the effect of F. nucleatum DnaK on the anti-cancer activity of cisplatin and 5FU. RESULTS: Our results show that both M. fermentans and F. nucleatum DnaKs reduce the effectiveness of cisplatin and 5FU. However, the use of ARV-1502 effectively restored the drugs' anti-cancer efficacy. CONCLUSIONS: Our findings offer a practical framework for designing and implementing novel personalized anti-cancer strategies by targeting specific bacterial DnaKs in patients with poor response to chemotherapy, underscoring the potential for microbiome-based personalized cancer therapies.


Asunto(s)
Antineoplásicos , Neoplasias , Animales , Ratones , Humanos , Cisplatino , Proteína p53 Supresora de Tumor , Fluorouracilo , Bacterias
5.
Artículo en Inglés | MEDLINE | ID: mdl-38252549

RESUMEN

Introduction: HIV-related comorbidities appear to be related to chronic inflammation, a condition characterizing people living with HIV (PLWH). Prior work indicates that cannabidiol (CBD) might reduce inflammation; however, the genetics underpinning of this effect are not well investigated. Our main objective is to detect gene expression alterations in human peripheral blood mononuclear cells (PBMCs) from PLWH after at least 1 month of CBD treatment. Materials and Methods: We analyzed ∼41,000 PBMCs from three PLWH at baseline and after CBD treatment (27-60 days) through single-cell RNA sequencing. Results: We obtained a coherent signature, characterized by an anti-inflammatory activity, of differentially expressed genes in myeloid cells. Conclusions: Our study shows how CBD is associated with alterations of gene expression in myeloid cells after CBD treatment. Clinical Trial Registration: NCT05209867.

6.
bioRxiv ; 2024 Feb 09.
Artículo en Inglés | MEDLINE | ID: mdl-37961168

RESUMEN

The coronavirus disease of 2019 (COVID-19) pandemic is characterized by sequential emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants and lineages outcompeting previously circulating ones because of, among other factors, increased transmissibility and immune escape1-3. We devised an unsupervised deep learning AutoEncoder for viral genomes anomaly detection to predict future dominant lineages (FDLs), i.e., lineages or sublineages comprising ≥10% of viral sequences added to the GISAID database on a given week4. The algorithm was trained and validated by assembling global and country-specific data sets from 16,187,950 Spike protein sequences sampled between December 24th, 2019, and November 8th, 2023. The AutoEncoder flags low frequency FDLs (0.01% - 3%), with median lead times of 4-16 weeks. Over time, positive predictive values oscillate, decreasing linearly with the number of unique sequences per data set, showing average performance up to 30 times better than baseline approaches. The B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than one year earlier of being considered for an updated COVID-19 vaccine. Our AutoEncoder, applicable in principle to any pathogen, also pinpoints specific mutations potentially linked to increased fitness, and may provide significant insights for the optimization of public health pre-emptive intervention strategies.

7.
Proc Mach Learn Res ; 218: 98-115, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37854935

RESUMEN

Developing models for individualized, time-varying treatment optimization from observational data with large variable spaces, e.g., electronic health records (EHR), is problematic because of inherent, complex bias that can change over time. Traditional methods such as the g-formula are robust, but must identify critical subsets of variables due to combinatorial issues. Machine learning approaches such as causal survival forests have fewer constraints and can provide fine-tuned, individualized counterfactual predictions. In this study, we aimed to optimize time-varying antibiotic treatment -identifying treatment heterogeneity and conditional treatment effects- against invasive methicillin-resistant Staphylococcus Aureus (MRSA) infections, using statewide EHR data collected in Florida, USA. While many previous studies focused on measuring the effects of the first empiric treatment (i.e., usually vancomycin), our study focuses on dynamic sequential treatment changes, comparing possible vancomycin switches with other antibiotics at clinically relevant time points, e.g., after obtaining a bacterial culture and susceptibility testing. Our study population included adult individuals admitted to the hospital with invasive MRSA. We collected demographic, clinical, medication, and laboratory information from the EHR for these patients. Then, we followed three sequential antibiotic choices (i.e., their empiric treatment, subsequent directed treatment, and final sustaining treatment), evaluating 30-day mortality as the outcome. We applied both causal survival forests and g-formula using different clinical intervention policies. We found that switching from vancomycin to another antibiotic improved survival probability, yet there was a benefit from initiating vancomycin compared to not using it at any time point. These findings show consistency with the empiric choice of vancomycin before confirmation of MRSA and shed light on how to manage switches on course. In conclusion, this application of causal machine learning on EHR demonstrates utility in modeling dynamic, heterogeneous treatment effects that cannot be evaluated precisely using randomized clinical trials.

8.
Front Microbiol ; 14: 1060891, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36960290

RESUMEN

Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer - KARGVA - an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license.

9.
Microbiol Spectr ; : e0308622, 2023 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-36847516

RESUMEN

In human immunodeficiency virus (HIV) infection, virus replication in and adaptation to the central nervous system (CNS) can result in neurocognitive deficits in approximately 25% of patients with unsuppressed viremia. While no single viral mutation can be agreed upon as distinguishing the neuroadapted population, earlier studies have demonstrated that a machine learning (ML) approach could be applied to identify a collection of mutational signatures within the virus envelope glycoprotein (Gp120) predictive of disease. The S[imian]IV-infected macaque is a widely used animal model of HIV neuropathology, allowing in-depth tissue sampling infeasible for human patients. Yet, translational impact of the ML approach within the context of the macaque model has not been tested, much less the capacity for early prediction in other, noninvasive tissues. We applied the previously described ML approach to prediction of SIV-mediated encephalitis (SIVE) using gp120 sequences obtained from the CNS of animals with and without SIVE with 97% accuracy. The presence of SIVE signatures at earlier time points of infection in non-CNS tissues indicated these signatures cannot be used in a clinical setting; however, combined with protein structural mapping and statistical phylogenetic inference, results revealed common denominators associated with these signatures, including 2-acetamido-2-deoxy-beta-d-glucopyranose structural interactions and high rate of alveolar macrophage (AM) infection. AMs were also determined to be the phyloanatomic source of cranial virus in SIVE animals, but not in animals that did not develop SIVE, implicating a role for these cells in the evolution of the signatures identified as predictive of both HIV and SIV neuropathology. IMPORTANCE HIV-associated neurocognitive disorders remain prevalent among persons living with HIV (PLWH) owing to our limited understanding of the contributing viral mechanisms and ability to predict disease onset. We have expanded on a machine learning method previously used on HIV genetic sequence data to predict neurocognitive impairment in PLWH to the more extensively sampled SIV-infected macaque model in order to (i) determine the translatability of the animal model and (ii) more accurately characterize the predictive capacity of the method. We identified eight amino acid and/or biochemical signatures in the SIV envelope glycoprotein, the most predominant of which demonstrated the potential for aminoglycan interaction characteristic of previously identified HIV signatures. These signatures were not isolated to specific points in time or to the central nervous system, limiting their use as an accurate clinical predictor of neuropathogenesis; however, statistical phylogenetic and signature pattern analyses implicate the lungs as a key player in the emergence of neuroadapted viruses.

10.
Sci Data ; 10(1): 5, 2023 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-36596792

RESUMEN

Multiparametric video-cabled marine observatories are becoming strategic to monitor remotely and in real-time the marine ecosystem. Those platforms can achieve continuous, high-frequency and long-lasting image data sets that require automation in order to extract biological time series. The OBSEA, located at 4 km from Vilanova i la Geltrú at 20 m depth, was used to produce coastal fish time series continuously over the 24-h during 2013-2014. The image content of the photos was extracted via tagging, resulting in 69917 fish tags of 30 taxa identified. We also provided a meteorological and oceanographic dataset filtered by a quality control procedure to define real-world conditions affecting image quality. The tagged fish dataset can be of great importance to develop Artificial Intelligence routines for the automated identification and classification of fishes in extensive time-lapse image sets.


Asunto(s)
Inteligencia Artificial , Ecosistema , Peces , Animales , Algoritmos , Benchmarking
11.
Nucleic Acids Res ; 51(D1): D744-D752, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36382407

RESUMEN

Antimicrobial resistance (AMR) is considered a critical threat to public health, and genomic/metagenomic investigations featuring high-throughput analysis of sequence data are increasingly common and important. We previously introduced MEGARes, a comprehensive AMR database with an acyclic hierarchical annotation structure that facilitates high-throughput computational analysis, as well as AMR++, a customized bioinformatic pipeline specifically designed to use MEGARes in high-throughput analysis for characterizing AMR genes (ARGs) in metagenomic sequence data. Here, we present MEGARes v3.0, a comprehensive database of published ARG sequences for antimicrobial drugs, biocides, and metals, and AMR++ v3.0, an update to our customized bioinformatic pipeline for high-throughput analysis of metagenomic data (available at MEGLab.org). Database annotations have been expanded to include information regarding specific genomic locations for single-nucleotide polymorphisms (SNPs) and insertions and/or deletions (indels) when required by specific ARGs for resistance expression, and the updated AMR++ pipeline uses this information to check for presence of resistance-conferring genetic variants in metagenomic sequenced reads. This new information encompasses 337 ARGs, whose resistance-conferring variants could not previously be confirmed in such a manner. In MEGARes 3.0, the nodes of the acyclic hierarchical ontology include 4 antimicrobial compound types, 59 resistance classes, 233 mechanisms and 1448 gene groups that classify the 8733 accessions.


Asunto(s)
Antibacterianos , Antiinfecciosos , Antibacterianos/farmacología , Farmacorresistencia Bacteriana/genética , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento
12.
Ann Surg ; 278(2): e349-e359, 2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-36111847

RESUMEN

OBJECTIVE: Our objective was to identify macrophage subpopulations and gene signatures associated with regenerative or fibrotic healing across different musculoskeletal injury types. BACKGROUND: Subpopulations of macrophages are hypothesized to fine tune the immune response after damage, promoting either normal regenerative, or aberrant fibrotic healing. METHODS: Mouse single-cell RNA sequencing data before and after injury were assembled from models of musculoskeletal injury, including regenerative and fibrotic mouse volumetric muscle loss (VML), regenerative digit tip amputation, and fibrotic heterotopic ossification. R packages Harmony , MacSpectrum , and Seurat were used for data integration, analysis, and visualizations. RESULTS: There was a substantial overlap between macrophages from the regenerative VML (2 mm injury) and regenerative bone models, as well as a separate overlap between the fibrotic VML (3 mm injury) and fibrotic bone (heterotopic ossification) models. We identified 2 fibrotic-like (FL 1 and FL 2) along with 3 regenerative-like (RL 1, RL 2, and RL 3) subpopulations of macrophages, each of which was transcriptionally distinct. We found that regenerative and fibrotic conditions had similar compositions of proinflammatory and anti-inflammatory macrophages, suggesting that macrophage polarization state did not correlate with healing outcomes. Receptor/ligand analysis of macrophage-to-mesenchymal progenitor cell crosstalk showed enhanced transforming growth factor ß in fibrotic conditions and enhanced platelet-derived growth factor signaling in regenerative conditions. CONCLUSION: Characterization of macrophage subtypes could be used to predict fibrotic responses following injury and provide a therapeutic target to tune the healing microenvironment towards more regenerative conditions.


Asunto(s)
Músculo Esquelético , Osificación Heterotópica , Ratones , Animales , Macrófagos , Cicatrización de Heridas/fisiología , Factor de Crecimiento Derivado de Plaquetas
13.
Sci Data ; 9(1): 750, 2022 12 03.
Artículo en Inglés | MEDLINE | ID: mdl-36463241

RESUMEN

Antarctica is a remote place, the continent is covered by ice and its surrounding coastal areas are frozen for the majority of the year. Due to its peculiarity the observation of the underwater organisms is particularly difficult, complicated by logistic factors. We present a long-term dataset consisting of 755 images acquired by using a non-invasive, autonomous imaging device and encompassing both the Antarctic daylight and dark periods, including the corresponding transition phases. All images have the same field of view showing the benthic fauna and part of the water column above, including fishes present in the monitored period. All the images are manually annotated after a visual inspection performed by expert biologists. The extended monitoring period and the annotated images make the dataset a valuable benchmark suitable for studying the dynamics of the long-term Antarctic underwater fauna as well as for developing and testing algorithms for automated image analysis focused on the recognition and classification of the Antarctic organisms and the automated analysis of their long-term dynamics.

14.
Front Bioeng Biotechnol ; 10: 1016408, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36324897

RESUMEN

Nanopore technology enables portable, real-time sequencing of microbial populations from clinical and ecological samples. An emerging healthcare application for Nanopore includes point-of-care, timely identification of antibiotic resistance genes (ARGs) to help developing targeted treatments of bacterial infections, and monitoring resistant outbreaks in the environment. While several computational tools exist for classifying ARGs from sequencing data, to date (2022) none have been developed for mobile devices. We present here KARGAMobile, a mobile app for portable, real-time, easily interpretable analysis of ARGs from Nanopore sequencing. KARGAMobile is the porting of an existing ARG identification tool named KARGA; it retains the same algorithmic structure, but it is optimized for mobile devices. Specifically, KARGAMobile employs a compressed ARG reference database and different internal data structures to save RAM usage. The KARGAMobile app features a friendly graphical user interface that guides through file browsing, loading, parameter setup, and process execution. More importantly, the output files are post-processed to create visual, printable and shareable reports, aiding users to interpret the ARG findings. The difference in classification performance between KARGAMobile and KARGA is minimal (96.2% vs. 96.9% f-measure on semi-synthetic datasets of 1 million reads with known resistance ground truth). Using real Nanopore experiments, KARGAMobile processes on average 1 GB data every 23-48 min (targeted sequencing - metagenomics), with peak RAM usage below 500MB, independently from input file sizes, and an average temperature of 49°C after 1 h of continuous data processing. KARGAMobile is written in Java and is available at https://github.com/Ruiz-HCI-Lab/KargaMobile under the MIT license.

15.
JCI Insight ; 7(20)2022 10 24.
Artículo en Inglés | MEDLINE | ID: mdl-36099022

RESUMEN

Transforming growth factor-ß1 (TGF-ß1) plays a central role in normal and aberrant wound healing, but the precise mechanism in the local environment remains elusive. Here, using a mouse model of aberrant wound healing resulting in heterotopic ossification (HO) after traumatic injury, we find autocrine TGF-ß1 signaling in macrophages, and not mesenchymal stem/progenitor cells, is critical in HO formation. In-depth single-cell transcriptomic and epigenomic analyses in combination with immunostaining of cells from the injury site demonstrated increased TGF-ß1 signaling in early infiltrating macrophages, with open chromatin regions in TGF-ß1-stimulated genes at binding sites specific for transcription factors of activated TGF-ß1 (SMAD2/3). Genetic deletion of TGF-ß1 receptor type 1 (Tgfbr1; Alk5), in macrophages, resulted in increased HO, with a trend toward decreased tendinous HO. To bypass the effect seen by altering the receptor, we administered a systemic treatment with TGF-ß1/3 ligand trap TGF-ßRII-Fc, which resulted in decreased HO formation and a delay in macrophage infiltration to the injury site. Overall, our data support the role of the TGF-ß1/ALK5 signaling pathway in HO.


Asunto(s)
Osificación Heterotópica , Factor de Crecimiento Transformador beta1 , Humanos , Cromatina/metabolismo , Ligandos , Macrófagos/metabolismo , Osificación Heterotópica/metabolismo , Receptor Tipo I de Factor de Crecimiento Transformador beta/genética , Factor de Crecimiento Transformador beta1/metabolismo , Cicatrización de Heridas , Factor de Crecimiento Transformador beta/metabolismo
16.
AMIA Jt Summits Transl Sci Proc ; 2022: 274-283, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35854723

RESUMEN

Drug-resistant bacterial infections are a global health concern with high mortality and limited treatment options. Several clinical risk-severity scores are available, e.g. qPitt, but their predictive performance is moderate. Here, we leveraged machine learning and electronic health records (EHRs) to improve prediction of mortality due to bloodstream infection with Klebsiella pneumoniae. We tested the qPitt score and new EHR variables (either expert-chosen or the full set of diagnostic codes), fitting LASSO, boosted logistic regression (BLR), support vector machines, decision trees, and random forests. The qPitt score showed moderate discriminative ability (AUROC=0.63), whilst machine learning models significantly improved its performance (best AUROC by BLR 0.80 for expert-chosen and 0.88 for full code set). Similar results were obtained in critically ill patients, and when excluding potential non-causal variables to evaluate an actionable model. In conclusion, current risk scores for bacteremia mortality can be improved and, with opportune causal modelling, considered for deployment in clinical decision-making.

17.
Artif Intell Med ; 130: 102326, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35809965

RESUMEN

Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR prediction tools from genotype-phenotype data can be biased, since sampling is non-randomized. Sample provenience, period of collection, and species representation can confound the association of genetic traits with AMR. Thus, prediction models can perform poorly on new data with sampling distribution shifts. In this work -under an explicit set of causal assumptions- we evaluate the effectiveness of propensity-based rebalancing and confounding adjustment on antibiotic resistance prediction using genotype-phenotype AMR data from the Pathosystems Resource Integration Center (PATRIC). We select bacterial genotypes (encoded as k-mer signatures, i.e., DNA fragments of length k), country, year, species, and AMR phenotypes for the tetracycline drug class, preparing test data with recent genomes coming from a single country. We test boosted logistic regression (BLR) and random forests (RF) with/without bias-handling. On 10,936 instances, we find evidence of species, location and year imbalance with respect to the AMR phenotype. The crude versus bias-adjusted change in effect of genetic signatures on AMR varies but only moderately (selecting the top 20,000 out of 40+ million k-mers). The area under the receiver operating characteristic (AUROC) of the RF (0.95) is comparable to that of BLR (0.94) on both out-of-bag samples from bootstrap and the external test (n = 1085), where AUROCs do not decrease. We observe a 1 %-5 % gain in AUROC with bias-handling compared to the sole use of genetic signatures. In conclusion, we recommend using causally-informed prediction methods for modeling real-world AMR data; however, traditional adjustment or propensity-based methods may not provide advantage in all use cases and further methodological development should be sought.


Asunto(s)
Antibacterianos , Genoma Bacteriano , Antibacterianos/farmacología , Farmacorresistencia Bacteriana/genética , Genotipo , Pruebas de Sensibilidad Microbiana , Secuenciación Completa del Genoma/métodos
18.
Stud Health Technol Inform ; 294: 654-658, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35612170

RESUMEN

In this work we show that Incremental Machine Learning can be used to predict the classification of emerging SARS-CoV-2 lineages, dynamically distinguishing between neutral variants and non-neutral ones, i.e. variants of interest or variants of concerns. Starting from the Spike protein primary sequences collected in the GISAID db, we have derived a set of k-mers features, i.e., aminoacid subsequences with fixed length k. We have then implemented a Logistic Regression Incremental Learner that was monthly tested on the variants collected since February 2020 until October 2021. The average value of balanced accuracy of the classifier is 0.72 ± 0.2, which increased to 0.78 ± 0.16 in the last 12 months. The alpha, beta, gamma, eta, kappa and delta variants were recognized as non-neutral variants with mean recall ∼90%. In summary, incremental learning proved to be a useful instrument for pandemic surveillance, given its capability to update the model on new data over time.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , Aprendizaje Automático , Mutación , SARS-CoV-2/genética , Glicoproteína de la Espiga del Coronavirus/genética , Glicoproteína de la Espiga del Coronavirus/metabolismo
19.
Gigascience ; 112022 05 18.
Artículo en Inglés | MEDLINE | ID: mdl-35583675

RESUMEN

BACKGROUND: Antimicrobial resistance (AMR) is a global health concern. High-throughput metagenomic sequencing of microbial samples enables profiling of AMR genes through comparison with curated AMR databases. However, the performance of current methods is often hampered by database incompleteness and the presence of homology/homoplasy with other non-AMR genes in sequenced samples. RESULTS: We present AMR-meta, a database-free and alignment-free approach, based on k-mers, which combines algebraic matrix factorization into metafeatures with regularized regression. Metafeatures capture multi-level gene diversity across the main antibiotic classes. AMR-meta takes in reads from metagenomic shotgun sequencing and outputs predictions about whether those reads contribute to resistance against specific classes of antibiotics. In addition, AMR-meta uses an augmented training strategy that joins an AMR gene database with non-AMR genes (used as negative examples). We compare AMR-meta with AMRPlusPlus, DeepARG, and Meta-MARC, further testing their ensemble via a voting system. In cross-validation, AMR-meta has a median f-score of 0.7 (interquartile range, 0.2-0.9). On semi-synthetic metagenomic data-external test-on average AMR-meta yields a 1.3-fold hit rate increase over existing methods. In terms of run-time, AMR-meta is 3 times faster than DeepARG, 30 times faster than Meta-MARC, and as fast as AMRPlusPlus. Finally, we note that differences in AMR ontologies and observed variance of all tools in classification outputs call for further development on standardization of benchmarking data and protocols. CONCLUSIONS: AMR-meta is a fast, accurate classifier that exploits non-AMR negative sets to improve sensitivity and specificity. The differences in AMR ontologies and the high variance of all tools in classification outputs call for the deployment of standard benchmarking data and protocols, to fairly compare AMR prediction tools.


Asunto(s)
Antibacterianos , Metagenómica , Antibacterianos/farmacología , Farmacorresistencia Bacteriana/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Metagenoma , Metagenómica/métodos
20.
Viruses ; 14(4)2022 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-35458495

RESUMEN

SARS-CoV-2, the causative agent of COVID-19, emerged in late 2019. The highly contagious B.1.617.2 (Delta) variant of concern (VOC) was first identified in October 2020 in India and subsequently disseminated worldwide, later becoming the dominant lineage in the US. Understanding the local transmission dynamics of early SARS-CoV-2 introductions may inform actionable mitigation efforts during subsequent pandemic waves. Yet, despite considerable genomic analysis of SARS-CoV-2 in the US, several gaps remain. Here, we explore the early emergence of the Delta variant in Florida, US using phylogenetic analysis of representative Florida and globally sampled genomes. We find multiple independent introductions into Florida primarily from North America and Europe, with a minority originating from Asia. These introductions led to three distinct clades that demonstrated varying relative rates of transmission and possessed five distinct substitutions that were 3-21 times more prevalent in the Florida sample as compared to the global sample. Our results underscore the benefits of routine viral genomic surveillance to monitor epidemic spread and support the need for more comprehensive genomic epidemiology studies of emerging variants. In addition, we provide a model of epidemic spread of newly emerging VOCs that can inform future public health responses.


Asunto(s)
COVID-19 , SARS-CoV-2 , COVID-19/epidemiología , Florida/epidemiología , Humanos , Mutación , Filogenia , SARS-CoV-2/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA