Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Crit Rev Microbiol ; 49(3): 391-413, 2023 May.
Artículo en Inglés | MEDLINE | ID: mdl-35468027

RESUMEN

Staphylococcus aureus is a notorious pathogen posing challenges in the medical industry due to drug resistance and biofilm formation. The horizon of knowledge on S. aureus pathogenesis has expanded with the advancement of data-driven bioinformatics techniques. Mining information from sequenced genomes and their expression data is an economic approach that alleviates wastage of resources and redundancy in experiments. The current review covers how big data bioinformatics has been used in the analysis of S. aureus from publicly available -omics data to uncover mechanisms of infection and inhibition. Particularly, advances in the past two decades in biomarker discovery, host responses, phenotype identification, consolidation of information, and drug development are discussed highlighting the challenges and shortcomings. Overall, the review summarizes the diverse aspects of scrupulous re-analysis of S. aureus proteomic and transcriptomic expression datasets retrieved from public repositories in terms of the efforts taken, benefits offered, and follow-up actions. The detailed review thus serves as a reference and aid for (i) Computational biologists by briefing the approaches utilized for bacterial omics re-analysis concerning S. aureus and (ii) Experimental biologists by elucidating the potential of bioinformatics in biological research to generate reliable postulates in a prompt and economical manner.


Asunto(s)
Infecciones Estafilocócicas , Staphylococcus aureus , Humanos , Proteómica , Macrodatos , Infecciones Estafilocócicas/tratamiento farmacológico , Infecciones Estafilocócicas/microbiología , Biología Computacional
2.
Philos Trans A Math Phys Eng Sci ; 379(2194): 20200246, 2021 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-33583272

RESUMEN

Recent advances in computing algorithms and hardware have rekindled interest in developing high-accuracy, low-cost surrogate models for simulating physical systems. The idea is to replace expensive numerical integration of complex coupled partial differential equations at fine time scales performed on supercomputers, with machine-learned surrogates that efficiently and accurately forecast future system states using data sampled from the underlying system. One particularly popular technique being explored within the weather and climate modelling community is the echo state network (ESN), an attractive alternative to other well-known deep learning architectures. Using the classical Lorenz 63 system, and the three tier multi-scale Lorenz 96 system (Thornes T, Duben P, Palmer T. 2017 Q. J. R. Meteorol. Soc. 143, 897-908. (doi:10.1002/qj.2974)) as benchmarks, we realize that previously studied state-of-the-art ESNs operate in two distinct regimes, corresponding to low and high spectral radius (LSR/HSR) for the sparse, randomly generated, reservoir recurrence matrix. Using knowledge of the mathematical structure of the Lorenz systems along with systematic ablation and hyperparameter sensitivity analyses, we show that state-of-the-art LSR-ESNs reduce to a polynomial regression model which we call Domain-Driven Regularized Regression (D2R2). Interestingly, D2R2 is a generalization of the well-known SINDy algorithm (Brunton SL, Proctor JL, Kutz JN. 2016 Proc. Natl Acad. Sci. USA 113, 3932-3937. (doi:10.1073/pnas.1517384113)). We also show experimentally that LSR-ESNs (Chattopadhyay A, Hassanzadeh P, Subramanian D. 2019 (http://arxiv.org/abs/1906.08829)) outperform HSR ESNs (Pathak J, Hunt B, Girvan M, Lu Z, Ott E. 2018 Phys. Rev. Lett. 120, 024102. (doi:10.1103/PhysRevLett.120.024102)) while D2R2 dominates both approaches. A significant goal in constructing surrogates is to cope with barriers to scaling in weather prediction and simulation of dynamical systems that are imposed by time and energy consumption in supercomputers. Inexact computing has emerged as a novel approach to helping with scaling. In this paper, we evaluate the performance of three models (LSR-ESN, HSR-ESN and D2R2) by varying the precision or word size of the computation as our inexactness-controlling parameter. For precisions of 64, 32 and 16 bits, we show that, surprisingly, the least expensive D2R2 method yields the most robust results and the greatest savings compared to ESNs. Specifically, D2R2 achieves 68 × in computational savings, with an additional 2 × if precision reductions are also employed, outperforming ESN variants by a large margin. This article is part of the theme issue 'Machine learning for weather and climate modelling'.

3.
J Biomed Inform ; 119: 103833, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34111555

RESUMEN

Adverse Drug Events (ADEs) are prevalent, costly, and sometimes preventable. Post-marketing drug surveillance aims to monitor ADEs that occur after a drug is released to market. Reports of such ADEs are aggregated by reporting systems, such as the Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS). In this paper, we consider the topic of how best to represent data derived from reports in FAERS for the purpose of detecting post-marketing surveillance signals, in order to inform regulatory decision making. In our previous work, we developed aer2vec, a method for deriving distributed representations (concept embeddings) of drugs and side effects from ADE reports, establishing the utility of distributional information for pharmacovigilance signal detection. In this paper, we advance this line of research further by evaluating the utility of encoding orthographic and lexical information. We do so by adapting two Natural Language Processing methods, subword embedding and vector retrofitting, which were developed to encode such information into word embeddings. Models were compared for their ability to distinguish between positive and negative examples in a set of manually curated drug/ADE relationships, with both aer2vec enhancements offering advantages in performances over baseline models, and best performance obtained when retrofitting and subword embeddings were applied in concert. In addition, this work demonstrates that models leveraging distributed representations do not require extensive manual preprocessing to perform well on this pharmacovigilance signal detection task, and may even benefit from information that would otherwise be lost during the normalization and standardization process.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Farmacovigilancia , Sistemas de Registro de Reacción Adversa a Medicamentos , Humanos , Procesamiento de Lenguaje Natural , Estados Unidos , United States Food and Drug Administration
4.
J Biomed Inform ; 119: 103818, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34022420

RESUMEN

OBJECTIVE: Study the impact of local policies on near-future hospitalization and mortality rates. MATERIALS AND METHODS: We introduce a novel risk-stratified SIR-HCD model that introduces new variables to model the dynamics of low-contact (e.g., work from home) and high-contact (e.g., work on-site) subpopulations while sharing parameters to control their respective R0(t) over time. We test our model on data of daily reported hospitalizations and cumulative mortality of COVID-19 in Harris County, Texas, from May 1, 2020, until October 4, 2020, collected from multiple sources (USA FACTS, U.S. Bureau of Labor Statistics, Southeast Texas Regional Advisory Council COVID-19 report, TMC daily news, and Johns Hopkins University county-level mortality reporting). RESULTS: We evaluated our model's forecasting accuracy in Harris County, TX (the most populated county in the Greater Houston area) during Phase-I and Phase-II reopening. Not only does our model outperform other competing models, but it also supports counterfactual analysis to simulate the impact of future policies in a local setting, which is unique among existing approaches. DISCUSSION: Mortality and hospitalization rates are significantly impacted by local quarantine and reopening policies. Existing models do not directly account for the effect of these policies on infection, hospitalization, and death rates in an explicit and explainable manner. Our work is an attempt to improve prediction of these trends by incorporating this information into the model, thus supporting decision-making. CONCLUSION: Our work is a timely effort to attempt to model the dynamics of pandemics under the influence of local policies.


Asunto(s)
COVID-19 , Hospitalización , Humanos , Pandemias , Políticas , SARS-CoV-2 , Estados Unidos
5.
Genomics ; 111(6): 1431-1446, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-30304708

RESUMEN

sRNAs are important post-transcriptional regulators in bacteria. The current study exploits potential of next-generation technology with computational analyses to develop a whole-genome sRNA-gene network for drug-resistant S. aureus by subjecting public expression-profiles to a novel analysis pipeline. Clustering and examination of the resultant global-interactome indicated a coordinated-regulation of numerous processes by various sRNAs with 9 sRNAs and 10 genes as potential hubs. 10 major sRNA-modules were annotated with various functions, among which a major module including of Rsa sRNAs was predicted to be a central regulatory unit. In addition, sRNA95, a hub molecule associated with this unit was predicted to be a vulnerable target. Finally, novel associations between transcriptional-regulators and sRNAs have been mined resulting in some insights into the association between RNAIII and RsaA. To our knowledge, this is the first study in S. aureus throwing insights into global sRNA-gene interactions and identify potential sRNAs to explore sRNA-based applications for therapeutics.


Asunto(s)
Proteínas Bacterianas/genética , Regulación Bacteriana de la Expresión Génica , Genoma Bacteriano , ARN Pequeño no Traducido/genética , RNA-Seq/métodos , Staphylococcus aureus/genética , Proteínas Bacterianas/metabolismo , Biología Computacional , Redes Reguladoras de Genes , ARN Pequeño no Traducido/metabolismo , Infecciones Estafilocócicas/genética , Infecciones Estafilocócicas/microbiología , Staphylococcus aureus/crecimiento & desarrollo , Staphylococcus aureus/metabolismo , Transcriptoma
6.
JMIR Diabetes ; 9: e53338, 2024 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-39110490

RESUMEN

BACKGROUND: Diabetic ketoacidosis (DKA) is the leading cause of morbidity and mortality in pediatric type 1 diabetes (T1D), occurring in approximately 20% of patients, with an economic cost of $5.1 billion/year in the United States. Despite multiple risk factors for postdiagnosis DKA, there is still a need for explainable, clinic-ready models that accurately predict DKA hospitalization in established patients with pediatric T1D. OBJECTIVE: We aimed to develop an interpretable machine learning model to predict the risk of postdiagnosis DKA hospitalization in children with T1D using routinely collected time-series of electronic health record (EHR) data. METHODS: We conducted a retrospective case-control study using EHR data from 1787 patients from among 3794 patients with T1D treated at a large tertiary care US pediatric health system from January 2010 to June 2018. We trained a state-of-the-art; explainable, gradient-boosted ensemble (XGBoost) of decision trees with 44 regularly collected EHR features to predict postdiagnosis DKA. We measured the model's predictive performance using the area under the receiver operating characteristic curve-weighted F1-score, weighted precision, and recall, in a 5-fold cross-validation setting. We analyzed Shapley values to interpret the learned model and gain insight into its predictions. RESULTS: Our model distinguished the cohort that develops DKA postdiagnosis from the one that does not (P<.001). It predicted postdiagnosis DKA risk with an area under the receiver operating characteristic curve of 0.80 (SD 0.04), a weighted F1-score of 0.78 (SD 0.04), and a weighted precision and recall of 0.83 (SD 0.03) and 0.76 (SD 0.05) respectively, using a relatively short history of data from routine clinic follow-ups post diagnosis. On analyzing Shapley values of the model output, we identified key risk factors predicting postdiagnosis DKA both at the cohort and individual levels. We observed sharp changes in postdiagnosis DKA risk with respect to 2 key features (diabetes age and glycated hemoglobin at 12 months), yielding time intervals and glycated hemoglobin cutoffs for potential intervention. By clustering model-generated Shapley values, we automatically stratified the cohort into 3 groups with 5%, 20%, and 48% risk of postdiagnosis DKA. CONCLUSIONS: We have built an explainable, predictive, machine learning model with potential for integration into clinical workflow. The model risk-stratifies patients with pediatric T1D and identifies patients with the highest postdiagnosis DKA risk using limited follow-up data starting from the time of diagnosis. The model identifies key time points and risk factors to direct clinical interventions at both the individual and cohort levels. Further research with data from multiple hospital systems can help us assess how well our model generalizes to other populations. The clinical importance of our work is that the model can predict patients most at risk for postdiagnosis DKA and identify preventive interventions based on mitigation of individualized risk factors.

7.
J Clin Endocrinol Metab ; 109(2): 402-412, 2024 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-37683082

RESUMEN

CONTEXT: Thyroid nodule ultrasound-based risk stratification schemas rely on the presence of high-risk sonographic features. However, some malignant thyroid nodules have benign appearance on thyroid ultrasound. New methods for thyroid nodule risk assessment are needed. OBJECTIVE: We investigated polygenic risk score (PRS) accounting for inherited thyroid cancer risk combined with ultrasound-based analysis for improved thyroid nodule risk assessment. METHODS: The convolutional neural network classifier was trained on thyroid ultrasound still images and cine clips from 621 thyroid nodules. Phenome-wide association study (PheWAS) and PRS PheWAS were used to optimize PRS for distinguishing benign and malignant nodules. PRS was evaluated in 73 346 participants in the Colorado Center for Personalized Medicine Biobank. RESULTS: When the deep learning model output was combined with thyroid cancer PRS and genetic ancestry estimates, the area under the receiver operating characteristic curve (AUROC) of the benign vs malignant thyroid nodule classifier increased from 0.83 to 0.89 (DeLong, P value = .007). The combined deep learning and genetic classifier achieved a clinically relevant sensitivity of 0.95, 95% CI [0.88-0.99], specificity of 0.63 [0.55-0.70], and positive and negative predictive values of 0.47 [0.41-0.58] and 0.97 [0.92-0.99], respectively. AUROC improvement was consistent in European ancestry-stratified analysis (0.83 and 0.87 for deep learning and deep learning combined with PRS classifiers, respectively). Elevated PRS was associated with a greater risk of thyroid cancer structural disease recurrence (ordinal logistic regression, P value = .002). CONCLUSION: Augmenting ultrasound-based risk assessment with PRS improves diagnostic accuracy.


Asunto(s)
Neoplasias de la Tiroides , Nódulo Tiroideo , Humanos , Nódulo Tiroideo/diagnóstico por imagen , Nódulo Tiroideo/genética , Puntuación de Riesgo Genético , Sensibilidad y Especificidad , Recurrencia Local de Neoplasia , Neoplasias de la Tiroides/diagnóstico por imagen , Neoplasias de la Tiroides/genética , Ultrasonografía/métodos
8.
Sci Am ; 319(4): 74-79, 2018 Sep 18.
Artículo en Inglés | MEDLINE | ID: mdl-30273319
9.
medRxiv ; 2023 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-37066407

RESUMEN

An objective method to identify imminent or current Multi-Inflammatory Syndrome in Children (MIS-C) infected with SARS-CoV-2 is highly desirable. The aims was to define an algorithmically interpreted novel cytokine/chemokine assay panel providing such an objective classification. This study was conducted on 4 groups of patients seen at multiple sites of Texas Children's Hospital, Houston, TX who consented to provide blood samples to our COVID-19 Biorepository. Standard laboratory markers of inflammation and a novel cytokine/chemokine array were measured in blood samples of all patients. Group 1 consisted of 72 COVID-19, 66 MIS-C and 63 uninfected control patients seen between May 2020 and January 2021 and predominantly infected with pre-alpha variants. Group 2 consisted of 29 COVID-19 and 43 MIS-C patients seen between January-May 2021 infected predominantly with the alpha variant. Group 3 consisted of 30 COVID-19 and 32 MIS-C patients seen between August-October 2021 infected with alpha and/or delta variants. Group 4 consisted of 20 COVID-19 and 46 MIS-C patients seen between October 2021-January 2022 infected with delta and/or omicron variants. Group 1 was used to train a L1-regularized logistic regression model which was validated using 5-fold cross validation, and then separately validated against the remaining naïve groups. The area under receiver operating curve (AUROC) and F1-score were used to quantify the performance of the algorithmically interpreted cytokine/chemokine assay panel. Standard laboratory markers predict MIS-C with a 5-fold cross-validated AUROC of 0.86 ± 0.05 and an F1 score of 0.78 ± 0.07, while the cytokine/chemokine panel predicted MIS-C with a 5-fold cross-validated AUROC of 0.95 ± 0.02 and an F1 score of 0.91 ± 0.04, with only sixteen of the forty-five cytokines/chemokines sufficient to achieve this performance. Tested on Group 2 the cytokine/chemokine panel yielded AUROC =0.98, F1=0.93, on Group 3 it yielded AUROC=0.89, F1 = 0.89, and on Group 4 AUROC= 0.99, F1= 0.97). Adding standard laboratory markers to the cytokine/chemokine panel did not improve performance. A top-10 subset of these 16 cytokines achieves equivalent performance on the validation data sets. Our findings demonstrate that a sixteen-cytokine/chemokine panel as well as the top ten subset provides a sensitive, specific method to identify MIS-C in patients infected with SARS-CoV-2 of all the major variants identified to date.

10.
J Clin Med ; 12(17)2023 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-37685502

RESUMEN

While pediatric COVID-19 is rarely severe, a small fraction of children infected with SARS-CoV-2 go on to develop multisystem inflammatory syndrome (MIS-C), with substantial morbidity. An objective method with high specificity and high sensitivity to identify current or imminent MIS-C in children infected with SARS-CoV-2 is highly desirable. The aim was to learn about an interpretable novel cytokine/chemokine assay panel providing such an objective classification. This retrospective study was conducted on four groups of pediatric patients seen at multiple sites of Texas Children's Hospital, Houston, TX who consented to provide blood samples to our COVID-19 Biorepository. Standard laboratory markers of inflammation and a novel cytokine/chemokine array were measured in blood samples of all patients. Group 1 consisted of 72 COVID-19, 70 MIS-C and 63 uninfected control patients seen between May 2020 and January 2021 and predominantly infected with pre-alpha variants. Group 2 consisted of 29 COVID-19 and 43 MIS-C patients seen between January and May 2021 infected predominantly with the alpha variant. Group 3 consisted of 30 COVID-19 and 32 MIS-C patients seen between August and October 2021 infected with alpha and/or delta variants. Group 4 consisted of 20 COVID-19 and 46 MIS-C patients seen between October 2021 andJanuary 2022 infected with delta and/or omicron variants. Group 1 was used to train an L1-regularized logistic regression model which was tested using five-fold cross validation, and then separately validated against the remaining naïve groups. The area under receiver operating curve (AUROC) and F1-score were used to quantify the performance of the cytokine/chemokine assay-based classifier. Standard laboratory markers predict MIS-C with a five-fold cross-validated AUROC of 0.86 ± 0.05 and an F1 score of 0.78 ± 0.07, while the cytokine/chemokine panel predicted MIS-C with a five-fold cross-validated AUROC of 0.95 ± 0.02 and an F1 score of 0.91 ± 0.04, with only sixteen of the forty-five cytokines/chemokines sufficient to achieve this performance. Tested on Group 2 the cytokine/chemokine panel yielded AUROC = 0.98 and F1 = 0.93, on Group 3 it yielded AUROC = 0.89 and F1 = 0.89, and on Group 4 AUROC = 0.99 and F1 = 0.97. Adding standard laboratory markers to the cytokine/chemokine panel did not improve performance. A top-10 subset of these 16 cytokines achieves equivalent performance on the validation data sets. Our findings demonstrate that a sixteen-cytokine/chemokine panel as well as the top ten subset provides a highly sensitive, and specific method to identify MIS-C in patients infected with SARS-CoV-2 of all the major variants identified to date.

11.
BMC Bioinformatics ; 13 Suppl 13: S10, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23320818

RESUMEN

BACKGROUND: Considerable progress has been made on algorithms for learning the structure of Bayesian networks from data. Model averaging by using bootstrap replicates with feature selection by thresholding is a widely used solution for learning features with high confidence. Yet, in the context of limited data many questions remain unanswered. What scoring functions are most effective for model averaging? Does the bias arising from the discreteness of the bootstrap significantly affect learning performance? Is it better to pick the single best network or to average multiple networks learnt from each bootstrap resample? How should thresholds for learning statistically significant features be selected? RESULTS: The best scoring functions are Dirichlet Prior Scoring Metric with small λ and the Bayesian Dirichlet metric. Correcting the bias arising from the discreteness of the bootstrap worsens learning performance. It is better to pick the single best network learnt from each bootstrap resample. We describe a permutation based method for determining significance thresholds for feature selection in bagged models. We show that in contexts with limited data, Bayesian bagging using the Dirichlet Prior Scoring Metric (DPSM) is the most effective learning strategy, and that modifying the scoring function to penalize complex networks hampers model averaging. We establish these results using a systematic study of two well-known benchmarks, specifically ALARM and INSURANCE. We also apply our network construction method to gene expression data from the Cancer Genome Atlas Glioblastoma multiforme dataset and show that survival is related to clinical covariates age and gender and clusters for interferon induced genes and growth inhibition genes. CONCLUSIONS: For small data sets, our approach performs significantly better than previously published methods.


Asunto(s)
Inteligencia Artificial , Perfilación de la Expresión Génica/estadística & datos numéricos , Modelos Estadísticos , Algoritmos , Teorema de Bayes , Análisis por Conglomerados , Femenino , Glioblastoma/genética , Humanos , Masculino
12.
BMC Bioinformatics ; 13 Suppl 13: S2, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23320851

RESUMEN

BACKGROUND: Electronic Health Records aggregated in Clinical Data Warehouses (CDWs) promise to revolutionize Comparative Effectiveness Research and suggest new avenues of research. However, the effectiveness of CDWs is diminished by the lack of properly labeled data. We present a novel approach that integrates knowledge from the CDW, the biomedical literature, and the Unified Medical Language System (UMLS) to perform high-throughput phenotyping. In this paper, we automatically construct a graphical knowledge model and then use it to phenotype breast cancer patients. We compare the performance of this approach to using MetaMap when labeling records. RESULTS: MetaMap's overall accuracy at identifying breast cancer patients was 51.1% (n=428); recall=85.4%, precision=26.2%, and F1=40.1%. Our unsupervised graph-based high-throughput phenotyping had accuracy of 84.1%; recall=46.3%, precision=61.2%, and F1=52.8%. CONCLUSIONS: We conclude that our approach is a promising alternative for unsupervised high-throughput phenotyping.


Asunto(s)
Neoplasias de la Mama/clasificación , Simulación por Computador , Registros Electrónicos de Salud , Modelos Biológicos , Femenino , Humanos , Fenotipo , Unified Medical Language System
13.
AMIA Annu Symp Proc ; 2022: 1163-1172, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-37128462

RESUMEN

Adverse event reports (AER) are widely used for post-market drug safety surveillance and drug repurposing, with the assumption that drugs with similar side-effects may have similar therapeutic effects also. In this study, we used distributed representations of drugs derived from the Food and Drug Administration (FDA) AER system using aer2vec, a method of representing AER, with drug embeddings emerging from a neural network trained to predict the probability of adverse drug effects given observed drugs. We combined these representations with molecular features to predict permeability of the blood-brain barrier to drugs, a prerequisite to their application to treat conditions of the central nervous system. Across multiple machine learning classifiers, the addition of distributed representations improved performance over prior methods using drug-drug similarity estimates derived from discrete representations of AER system data. Embedding-based approaches outperformed those using discrete statistics, with improvements in absolute AUC of 5% and 9%, corresponding to improvements of 9% and 13% over performance with molecular features only. Performance was retained when reducing embedding dimensions from 500 to 6, indicating that they are neither attributable to overfitting, nor to a difference in the number of trainable parameters. These results indicate that aer2vec distributed representations carry information that is valuable for drug repurposing.


Asunto(s)
Barrera Hematoencefálica , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Preparaciones Farmacéuticas , Redes Neurales de la Computación , Aprendizaje Automático
14.
AMIA Jt Summits Transl Sci Proc ; 2022: 349-358, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35854716

RESUMEN

Although pharmaceutical products undergo clinical trials to profile efficacy and safety, some adverse drug reactions (ADRs) are only discovered after release to market. Post-market drug safety surveillance - pharmacovigilance - leverages information from various sources to proactively identify such ADRs. Clinical notes are one source of observational data that could assist this process, but their inherent complexity can obfuscate possible ADR signals. In previous research, embeddings trained on observational reports have improved detection of such signals over commonly used statistical measures. Moreover, neural embedding methods which further encode juxtapositional information have shown promise on analogical retrieval tasks, suggesting proximity-based alternatives to document-level modeling for signal detection. This work uses natural language processing and locality sensitive neural embeddings to increase ADR signal recovery from clinical notes, with AUCs of ~0.63-0.71. Constituting a ~50% increase over baselines, our method sets the state-of-the-art for these reference standards when solely leveraging clinical notes.

15.
Infect Genet Evol ; 88: 104702, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33388440

RESUMEN

Biofilm forming Staphylococcus aureus is a major threat to the health-care industry. It is important to understand the differences between planktonic and biofilm growth forms in the pathogen since conventional treatments targeting the planktonic forms are not effective against biofilms. The current study conducts a meta-analysis of three public transcriptomic profiles to examine the differences in gene expression between the planktonic and biofilm states of S. aureus using random-effects modeling. Mean effect sizes were calculated for 2847 genes among which 726 differentially expressed genes were taken for further analysis. Major genes that are discriminatory between the two conditions were mined using supervised learning techniques and validated by high-accuracy classifiers. Ten different feature selection algorithms were applied and used to rank the most important genes in S. aureus biofilms. Finally, an optimal set of 36 genes are presented as candidate genes in biofilm formation or development while throwing light on the novel roles of an acyl-CoA thioesterase enzyme and 10 hypothetical proteins in biofilms. The relevance of the identified gene set was further validated by building five different classification models using SVM, RF, kNN, NB and DT algorithms that were compared with models built from other relevant gene sets and by reviewing the functional role of 25 previously known genes in biofilm development. The study combines meta-analysis of differential expression with supervised machine learning strategies and feature selection for the first time to identify and validate a discriminatory set of genes important in biofilms of S. aureus. The functional roles of the identified genes predicted to be important in biofilms are further scrutinized and can be considered as a signature target list to develop anti-biofilm therapeutics in S. aureus.


Asunto(s)
Biopelículas , Infecciones Estafilocócicas/microbiología , Staphylococcus aureus/crecimiento & desarrollo , Staphylococcus aureus/genética , Aprendizaje Automático Supervisado , Transcriptoma , Algoritmos , Conjuntos de Datos como Asunto , Regulación Bacteriana de la Expresión Génica , Humanos , Análisis por Micromatrices , RNA-Seq
16.
BMC Bioinformatics ; 11: 163, 2010 Mar 31.
Artículo en Inglés | MEDLINE | ID: mdl-20356373

RESUMEN

BACKGROUND: Identifying candidate genes in genetic networks is important for understanding regulation and biological function. Large gene expression datasets contain relevant information about genetic networks, but mining the data is not a trivial task. Algorithms that infer Bayesian networks from expression data are powerful tools for learning complex genetic networks, since they can incorporate prior knowledge and uncover higher-order dependencies among genes. However, these algorithms are computationally demanding, so novel techniques that allow targeted exploration for discovering new members of known pathways are essential. RESULTS: Here we describe a Bayesian network approach that addresses a specific network within a large dataset to discover new components. Our algorithm draws individual genes from a large gene-expression repository, and ranks them as potential members of a known pathway. We apply this method to discover new components of the cAMP-dependent protein kinase (PKA) pathway, a central regulator of Dictyostelium discoideum development. The PKA network is well studied in D. discoideum but the transcriptional networks that regulate PKA activity and the transcriptional outcomes of PKA function are largely unknown. Most of the genes highly ranked by our method encode either known components of the PKA pathway or are good candidates. We tested 5 uncharacterized highly ranked genes by creating mutant strains and identified a candidate cAMP-response element-binding protein, yet undiscovered in D. discoideum, and a histidine kinase, a candidate upstream regulator of PKA activity. CONCLUSIONS: The single-gene expansion method is useful in identifying new components of known pathways. The method takes advantage of the Bayesian framework to incorporate prior biological knowledge and discovers higher-order dependencies among genes while greatly reducing the computational resources required to process high-throughput datasets.


Asunto(s)
Proteínas Quinasas Dependientes de AMP Cíclico/metabolismo , Dictyostelium/enzimología , Expresión Génica , Genómica/métodos , Teorema de Bayes , Dictyostelium/genética , Dictyostelium/crecimiento & desarrollo , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Transducción de Señal/genética
17.
Drug Saf ; 43(1): 67-77, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31646442

RESUMEN

INTRODUCTION: As a result of the well documented limitations of data collected by spontaneous reporting systems (SRS), such as bias and under-reporting, a number of authors have evaluated the utility of other data sources for the purpose of pharmacovigilance, including the biomedical literature. Previous work has demonstrated the utility of literature-derived distributed representations (concept embeddings) with machine learning for the purpose of drug side-effect prediction. In terms of data sources, these methods are complementary, observing drug safety from two different perspectives (knowledge extracted from the literature and statistics from SRS data). However, the combined utility of these pharmacovigilance methods has yet to be evaluated. OBJECTIVE: This research investigates the utility of directly or indirectly combining an observational signal from SRS with literature-derived distributed representations into a single feature vector or in an ensemble approach for downstream machine learning (logistic regression). METHODS: Leveraging a recently developed representation scheme, concept embeddings were generated from relational connections extracted from the literature and composed to represent drug and associated adverse reactions, as defined by two reference standards of positive (likely causal) and negative (no causal evidence) pairs. Embeddings were presented with and without common measures of observational signal from SRS sources to logistic regressors, and performance was evaluated with the receiver operating characteristic (ROC) area under the curve (AUC) metric. RESULTS: ROC AUC performance with these composite models improves up to ≈ 20% over SRS-based disproportionality metrics alone and exceeds the best prior results reported in the literature when models leverage both sources of information. CONCLUSIONS: Results from this study support the hypothesis that knowledge extracted from the literature can enhance the performance of SRS-based methods (and vice versa). Across reference sets, using literature and SRS information together performed better than using either source alone, providing strong support for the complementary nature of these approaches to post-marketing drug surveillance.


Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Vigilancia de Productos Comercializados/métodos , Humanos , Modelos Logísticos , Aprendizaje Automático , Farmacovigilancia
18.
Indian J Med Microbiol ; 37(2): 173-185, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31745016

RESUMEN

Context: Vancomycin-intermediate Staphylococcus aureus remains one of the most prevalent multidrug-resistant pathogens causing healthcare infections that are difficult to treat. Aims: This study uses a comprehensive computational analysis to systematically investigate various gene expression profiles of resistant and sensitive S. aureus strains on exposure to antibiotics. Settings and Design: The transcriptional changes leading to the development of multiple antibiotic resistance were examined by an integrative analysis of nine differential expression experiments under selected conditions of vancomycin-intermediate and -sensitive strains for four different antibiotics using publicly available RNA-Seq datasets. Materials and Methods: For each antibiotic, three experimental conditions for expression analysis were selected to identify those genes that are particularly involved in the development of resistance. The results were further scrutinised to generate a resistome that can be analysed for their role in the development or adaptation to antibiotic resistance. Results: The 99 genes in the resistome are then compiled to create a multiple drug resistome of 25 known and novel genes identified to play a part in antibiotic resistance. The inclusion of agr genes and associated virulence factors in the identified resistome supports the role of agr quorum sensing system in multiple drug resistance. In addition, enrichment analysis also identified the kyoto encyclopedia of genes and genomes (KEGG) pathways - quorum sensing and two-component system pathways - in the resistome gene set. Conclusion: Further studies on understanding the role of the identified molecular targets such as SAA6008_00181, SAA6008_01127, agrA, agrC and coa in adapting to the pressure of antibiotics at sub-inhibitory concentrations can help in learning the molecular mechanisms causing resistance to the pathogens as well as finding other potential therapeutics.


Asunto(s)
Farmacorresistencia Bacteriana , Genes Bacterianos , Transducción de Señal , Infecciones Estafilocócicas/microbiología , Staphylococcus aureus/efectos de los fármacos , Staphylococcus aureus/fisiología , Vancomicina/farmacología , Antibacterianos/farmacología , Regulación Bacteriana de la Expresión Génica/efectos de los fármacos , Humanos , Pruebas de Sensibilidad Microbiana , RNA-Seq , Factores de Virulencia
19.
AMIA Annu Symp Proc ; 2019: 717-726, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-32308867

RESUMEN

Adverse event report (AER) data are a key source of signal for post marketing drug surveillance. The standard methodology to analyze AER data applies disproportionality metrics, which estimate the strength of drug/side-effect associations from discrete counts of their occurrence at report level. However, in other domains, improvements in predictive modeling accuracy have been obtained through representation learning, where discrete features are replaced by distributed representations learned from unlabeled data. This paper describes aer2vec, a novel representational approach for AER data in which concept embeddings emerge from neural networks trained to predict drug/side-effect co-occurrence. Trained models are evaluated for their utility in identifying drug/side-effect relationships, with improvements over disproportionality metrics in most cases. In addition, we evaluate the utility of an otherwise-untapped resource in the Food and Drug Administration (FDA) AER system - reporter designations of suspected causality - and find that incorporating this information enhances performance of all models evaluated.


Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/diagnóstico , Modelos Teóricos , Vigilancia de Productos Comercializados , Bases de Datos Factuales , Humanos , Redes Neurales de la Computación , Estados Unidos , United States Food and Drug Administration
20.
AMIA Annu Symp Proc ; 2019: 992-1001, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-32308896

RESUMEN

The identification of drug-drug interactions (DDIs) is important for patient safety; yet, compared to other pharmacovigilance work, a limited amount of research has been conducted in this space. Recent work has successfully applied a method of deriving distributed vector representations from structured biomedical knowledge, known as Embedding of Semantic Predications (ESP), to the problem of predicting individual drug side effects. In the current paper we extend this work by applying ESP to the problem of predicting polypharmacy side-effects for particular drug combinations, building on a recent reconceptualization of this problem as a network of drug nodes connected by side effect edges. We evaluate ESP embeddings derived from the resulting graph on a side-effect prediction task against a previously reported graph convolutional neural network approach, using the same data and evaluation methods. We demonstrate that ESP models perform better, while being faster to train, more re-usable, and significantly simpler.


Asunto(s)
Interacciones Farmacológicas , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Modelos Biológicos , Redes Neurales de la Computación , Farmacovigilancia , Polifarmacia , Algoritmos , Biología Computacional , Visualización de Datos , Humanos , Semántica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA