Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 134
Filtrar
1.
J Leukoc Biol ; 2024 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-38478636

RESUMO

Melanoma, caused by malignant melanocytes, is known for its invasiveness and poor prognosis. Therapies are often ineffective due to their heterogeneity and resistance. Bacillus Calmette-Guérin (BCG), primarily a tuberculosis vaccine, shows potential in treating melanoma by activating immune responses. In this study, data from The Cancer Genome Atlas and the NCBI GEO database were utilized to determine pivotal differentially expressed genes (DEGs) such as DSC2, CXCR1, BOK, and CSTB, which are significantly upregulated in BCG treated blood samples and are strongly associated with the prognosis of melanoma. We employ tools like edgeR and ggplot2 for functional and pathway analysis and develop a prognostic model using LASSO Cox regression analysis to predict patient survival. A notable finding is the correlation between BCG-related genes and immune cell infiltration in melanoma, highlighting the potential of these genes as both biomarkers and therapeutic targets. Additionally, the study examines genetic alterations in these genes and their impact on the disease. This study highlights the necessity of further exploring BCG-related genes for insights into melanoma pathogenesis and treatment enhancement, suggesting that BCG's role in immune activation could offer novel therapeutic avenues in cancer treatment.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38145528

RESUMO

Currently, resting-state electroencephalography (rs-EEG) has become an effective and low-cost evaluation way to identify autism spectrum disorders (ASD) in children. However, it is of great challenge to extract useful features from raw rs-EEG data to improve diagnosis performance. Traditional methods mainly rely on the design of manual feature extractors and classifiers, which are separately performed and cannot be optimized simultaneously. To this end, this paper proposes a new end-to-end diagnostic method based on a recently emerged graph convolutional neural network for the diagnosis of ASD in children. Inspired by related neuroscience findings on the abnormal brain functional connectivity and hemispheric asymmetry characteristics observed in autism patients, we design a new Regional-asymmetric Adaptive Graph Convolutional Neural Network (RAGNN). It utilizes a hierarchical feature extraction and fusion process to learn separable spatiotemporal EEG features from different brain regions, two hemispheres, and a global brain. In the temporal feature extraction section, we utilize a convolutional layer that spans from the brain area to the hemisphere. This allows for effectively capturing temporal features both within and between brain areas. To better capture spatial characteristics of multi-channel EEG signals, we employ adaptive graph convolutional learning to capture non-Euclidean features within the brain's hemispheres. Additionally, an attention layer is introduced to highlight different contributions of the left and right hemispheres, and the fused features are used for classification. We conducted a subject-independent cross-validation experiment on rs-EEG data from 45 children with ASD and 45 typically developing (TD) children. Experimental results have shown that our proposed RAGNN model outperformed several existing deep learning-based methods (ShaollowNet, EEGNet, TSception, ST-GCN, and CGRU-MDGN).


Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Criança , Humanos , Transtorno Autístico/diagnóstico , Transtorno do Espectro Autista/diagnóstico , Encéfalo , Eletroencefalografia , Redes Neurais de Computação
3.
Psychiatr Res Clin Pract ; 5(4): 118-125, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38077277

RESUMO

Objective: To evaluate if a machine learning approach can accurately predict antidepressant treatment outcome using electronic health records (EHRs) from patients with depression. Method: This study examined 808 patients with depression at a New York City-based outpatient mental health clinic between June 13, 2016 and June 22, 2020. Antidepressant treatment outcome was defined based on trend in depression symptom severity over time and was categorized as either "Recovering" or "Worsening" (i.e., non-Recovering), measured by the slope of individual-level Patient Health Questionnaire-9 (PHQ-9) score trajectory spanning 6 months following treatment initiation. A patient was designated as "Recovering" if the slope is less than 0 and as "Worsening" if the slope was no less than 0. Multiple machine learning (ML) models including L2 norm regularized Logistic Regression, Naive Bayes, Random Forest, and Gradient Boosting Decision Tree (GBDT) were used to predict treatment outcome based on additional data from EHRs, including demographics and diagnoses. Shapley Additive Explanations were applied to identify the most important predictors. Results: The GBDT achieved the best results of predicting "Recovering" (AUC: 0.7654 ± 0.0227; precision: 0.6002 ± 0.0215; recall: 0.5131 ± 0.0336). When excluding patients with low PHQ-9 scores (<10) at baseline, the results of predicting "Recovering" (AUC: 0.7254 ± 0.0218; precision: 0.5392 ± 0.0437; recall: 0.4431 ± 0.0513) were obtained. Prior diagnosis of anxiety, psychotherapy, recurrent depression, and baseline depression symptom severity were strong predictors. Conclusions: The results demonstrate the potential utility of using ML in longitudinal EHRs to predict antidepressant treatment outcome. Our predictive tool holds the promise to accelerate personalized medical management in patients with psychiatric illnesses.

4.
J Biomed Inform ; 148: 104534, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37918622

RESUMO

This work continues along a visionary path of using Semantic Web standards such as RDF and ShEx to make healthcare data easier to integrate for research and leading-edge patient care. The work extends the ability to use ShEx schemas to validate FHIR RDF data, thereby enhancing the semantic web ecosystem for working with FHIR and non-FHIR data using the same ShEx validation framework. It updates FHIR's ShEx schemas to fix outstanding issues and reflect changes in the definition of FHIR RDF. In addition, it experiments with expressing FHIRPath constraints (which are not captured in the XML or JSON schemas) in ShEx schemas. These extended ShEx schemas were incorporated into the FHIR R5 specification and used to successfully validate FHIR R5 examples that are included with the FHIR specification, revealing several errors in the examples.


Assuntos
Ecossistema , Registros Eletrônicos de Saúde , Humanos , Atenção à Saúde
5.
J Biomed Inform ; 144: 104442, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37429512

RESUMO

OBJECTIVE: We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). METHODS: We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000 and 2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting into sections, and then pre-trained a BERT model for AD (named AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. All sections of a patient were embedded into a vector representation by AD-BERT and then combined by global MaxPooling and a fully connected network to compute the probability of MCI-to-AD progression. For validation, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. RESULTS: Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.849 and F1 score of 0.440 on NMEDW dataset, and AUC of 0.883 and F1 score of 0.680 on WCM dataset. CONCLUSION: The use of EHRs for AD-related research is promising, and AD-BERT shows superior predictive performance in modeling MCI-to-AD progression prediction. Our study demonstrates the utility of pre-trained language models and clinical notes in predicting MCI-to-AD progression, which could have important implications for improving early detection and intervention for AD.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Doença de Alzheimer/diagnóstico , Disfunção Cognitiva/diagnóstico , Progressão da Doença
6.
Sci Rep ; 13(1): 8102, 2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37208478

RESUMO

The objective of this study was to investigate the potential association between the use of four frequently prescribed drug classes, namely antihypertensive drugs, statins, selective serotonin reuptake inhibitors, and proton-pump inhibitors, and the likelihood of disease progression from mild cognitive impairment (MCI) to dementia using electronic health records (EHRs). We conducted a retrospective cohort study using observational EHRs from a cohort of approximately 2 million patients seen at a large, multi-specialty urban academic medical center in New York City, USA between 2008 and 2020 to automatically emulate the randomized controlled trials. For each drug class, two exposure groups were identified based on the prescription orders documented in the EHRs following their MCI diagnosis. During follow-up, we measured drug efficacy based on the incidence of dementia and estimated the average treatment effect (ATE) of various drugs. To ensure the robustness of our findings, we confirmed the ATE estimates via bootstrapping and presented associated 95% confidence intervals (CIs). Our analysis identified 14,269 MCI patients, among whom 2501 (17.5%) progressed to dementia. Using average treatment estimation and bootstrapping confirmation, we observed that drugs including rosuvastatin (ATE = - 0.0140 [- 0.0191, - 0.0088], p value < 0.001), citalopram (ATE = - 0.1128 [- 0.125, - 0.1005], p value < 0.001), escitalopram (ATE = - 0.0560 [- 0.0615, - 0.0506], p value < 0.001), and omeprazole (ATE = - 0.0201 [- 0.0299, - 0.0103], p value < 0.001) have a statistically significant association in slowing the progression from MCI to dementia. The findings from this study support the commonly prescribed drugs in altering the progression from MCI to dementia and warrant further investigation.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Humanos , Doença de Alzheimer/diagnóstico , Estudos Retrospectivos , Registros Eletrônicos de Saúde , Progressão da Doença , Disfunção Cognitiva/tratamento farmacológico , Disfunção Cognitiva/epidemiologia , Disfunção Cognitiva/diagnóstico , Ensaios Clínicos Controlados Aleatórios como Assunto
7.
J Interv Card Electrophysiol ; 66(8): 1817-1825, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36738387

RESUMO

BACKGROUND: The ThermoCool STSF catheter is used for ablation of ischemic ventricular tachycardia (VT) in routine clinical practice, although outcomes have not been studied and the catheter does not have Food and Drug Administration (FDA) approval for this indication. We used real-world health system data to evaluate its safety and effectiveness for this indication. METHODS: Among patients undergoing ischemic VT ablation with the ThermoCool STSF catheter pooled across two health systems (Mercy Health and Mayo Clinic), the primary safety composite outcome of death, thromboembolic events, and procedural complications within 7 days was compared to a performance goal of 15%, which is twice the expected proportion of the primary composite safety outcome based on prior studies. The exploratory effectiveness outcome of rehospitalization for VT or heart failure or repeat VT ablation at up to 1 year was averaged across health systems among patients treated with the ThermoCool STSF vs. ST catheters. RESULTS: Seventy total patients received ablation for ischemic VT using the ThermoCool STSF catheter. The primary safety composite outcome occurred in 3/70 (4.3%; 90% CI, 1.2-10.7%) patients, meeting the pre-specified performance goal, p = 0.0045. At 1 year, the effectiveness outcome risk difference (STSF-ST) at Mercy was - 0.4% (90% CI: - 25.2%, 24.3%) and at Mayo Clinic was 12.6% (90% CI: - 13.0%, 38.4%); the average risk difference across both institutions was 5.8% (90% CI: - 12.0, 23.7). CONCLUSIONS: The ThermoCool STSF catheter was safe and appeared effective for ischemic VT ablation, supporting continued use of the catheter and informing possible FDA label expansion. Health system data hold promise for real-world safety and effectiveness evaluation of cardiovascular devices.


Assuntos
Ablação por Cateter , Taquicardia Ventricular , Humanos , Resultado do Tratamento , Taquicardia Ventricular/terapia , Arritmias Cardíacas/cirurgia , Catéteres , Ablação por Cateter/efeitos adversos
8.
JAMIA Open ; 6(1): ooac108, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36632328

RESUMO

The objective of this study is to describe application of the Observational Medical Outcomes Partnership (OMOP) common data model (CDM) to support medical device real-world evaluation in a National Evaluation System for health Technology Coordinating Center (NESTcc) Test-Case involving 2 healthcare systems, Mercy Health and Mayo Clinic. CDM implementation was coordinated across 2 healthcare systems with multiple hospitals to aggregate both medical device data from supply chain databases and patient outcomes and covariates from electronic health record data. Several data quality assurance (QA) analyses were implemented on the OMOP CDM to validate the data extraction, transformation, and load (ETL) process. OMOP CDM-based data of relevant patient encounters were successfully established to support studies for FDA regulatory submissions. QA analyses verified that the data transformation was robust between data sources and OMOP CDM. Our efforts provided useful insights in real-world data integration using OMOP CDM for medical device evaluation coordinated across multiple healthcare systems.

9.
Sci Rep ; 13(1): 294, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36609415

RESUMO

Left ventricular ejection fraction (EF) is a key measure in the diagnosis and treatment of heart failure (HF) and many patients experience changes in EF overtime. Large-scale analysis of longitudinal changes in EF using electronic health records (EHRs) is limited. In a multi-site retrospective study using EHR data from three academic medical centers, we investigated longitudinal changes in EF measurements in patients diagnosed with HF. We observed significant variations in baseline characteristics and longitudinal EF change behavior of the HF cohorts from a previous study that is based on HF registry data. Data gathered from this longitudinal study were used to develop multiple machine learning models to predict changes in ejection fraction measurements in HF patients. Across all three sites, we observed higher performance in predicting EF increase over a 1-year duration, with similarly higher performance predicting an EF increase of 30% from baseline compared to lower percentage increases. In predicting EF decrease we found moderate to high performance with low confidence for various models. Among various machine learning models, XGBoost was the best performing model for predicting EF changes. Across the three sites, the XGBoost model had an F1-score of 87.2, 89.9, and 88.6 and AUC of 0.83, 0.87, and 0.90 in predicting a 30% increase in EF, and had an F1-score of 95.0, 90.6, 90.1 and AUC of 0.54, 0.56, 0.68 in predicting a 30% decrease in EF. Among features that contribute to predicting EF changes, baseline ejection fraction measurement, age, gender, and heart diseases were found to be statistically significant.


Assuntos
Insuficiência Cardíaca , Função Ventricular Esquerda , Humanos , Registros Eletrônicos de Saúde , Estudos Longitudinais , Aprendizado de Máquina , Prognóstico , Estudos Retrospectivos , Volume Sistólico
10.
Genomics Proteomics Bioinformatics ; 20(5): 850-866, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36462630

RESUMO

The recent development of imaging and sequencing technologies enables systematic advances in the clinical study of lung cancer. Meanwhile, the human mind is limited in effectively handling and fully utilizing the accumulation of such enormous amounts of data. Machine learning-based approaches play a critical role in integrating and analyzing these large and complex datasets, which have extensively characterized lung cancer through the use of different perspectives from these accrued data. In this review, we provide an overview of machine learning-based approaches that strengthen the varying aspects of lung cancer diagnosis and therapy, including early detection, auxiliary diagnosis, prognosis prediction, and immunotherapy practice. Moreover, we highlight the challenges and opportunities for future applications of machine learning in lung cancer.


Assuntos
Neoplasias Pulmonares , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/terapia , Aprendizado de Máquina
11.
J Biomed Inform ; 134: 104201, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36089199

RESUMO

BACKGROUND: Knowledge graphs (KGs) play a key role to enable explainable artificial intelligence (AI) applications in healthcare. Constructing clinical knowledge graphs (CKGs) against heterogeneous electronic health records (EHRs) has been desired by the research and healthcare AI communities. From the standardization perspective, community-based standards such as the Fast Healthcare Interoperability Resources (FHIR) and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) are increasingly used to represent and standardize EHR data for clinical data analytics, however, the potential of such a standard on building CKG has not been well investigated. OBJECTIVE: To develop and evaluate methods and tools that expose the OMOP CDM-based clinical data repositories into virtual clinical KGs that are compliant with FHIR Resource Description Framework (RDF) specification. METHODS: We developed a system called FHIR-Ontop-OMOP to generate virtual clinical KGs from the OMOP relational databases. We leveraged an OMOP CDM-based Medical Information Mart for Intensive Care (MIMIC-III) data repository to evaluate the FHIR-Ontop-OMOP system in terms of the faithfulness of data transformation and the conformance of the generated CKGs to the FHIR RDF specification. RESULTS: A beta version of the system has been released. A total of more than 100 data element mappings from 11 OMOP CDM clinical data, health system and vocabulary tables were implemented in the system, covering 11 FHIR resources. The generated virtual CKG from MIMIC-III contains 46,520 instances of FHIR Patient, 716,595 instances of Condition, 1,063,525 instances of Procedure, 24,934,751 instances of MedicationStatement, 365,181,104 instances of Observations, and 4,779,672 instances of CodeableConcept. Patient counts identified by five pairs of SQL (over the MIMIC database) and SPARQL (over the virtual CKG) queries were identical, ensuring the faithfulness of the data transformation. Generated CKG in RDF triples for 100 patients were fully conformant with the FHIR RDF specification. CONCLUSION: The FHIR-Ontop-OMOP system can expose OMOP database as a FHIR-compliant RDF graph. It provides a meaningful use case demonstrating the potentials that can be enabled by the interoperability between FHIR and OMOP CDM. Generated clinical KGs in FHIR RDF provide a semantic foundation to enable explainable AI applications in healthcare.


Assuntos
Inteligência Artificial , Reconhecimento Automatizado de Padrão , Data Warehousing , Atenção à Saúde , Registros Eletrônicos de Saúde , Humanos
12.
JAMA Netw Open ; 5(8): e2227134, 2022 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-35976649

RESUMO

Importance: The ThermoCool SmartTouch catheter (ablation catheter with contact force and 6-hole irrigation [CF-I6]) is approved by the US Food and Drug Administration (FDA) for paroxysmal atrial fibrillation (AF) ablation and used in routine clinical practice for persistent AF ablation, although clinical outcomes for this indication are unknown. There is a need to understand whether data from routine clinical practice can be used to conduct regulatory-grade evaluations and support label expansions. Objective: To use health system data to compare the safety and effectiveness of the CF-I6 catheter for persistent AF ablation with the ThermoCool SmartTouch SurroundFlow catheter (ablation catheter with contact force and 56-hole irrigation [CF-I56]), which is approved by the FDA for this indication. Design, Setting, and Participants: This retrospective, comparative-effectiveness cohort study included patients undergoing catheter ablation for persistent AF at Mercy Health or Mayo Clinic from January 1, 2014, to April 30, 2021, with up to a 1-year follow-up using electronic health record data. Exposures: Use of the CF-I6 or CF-I56 catheter. Main Outcomes and Measures: The primary safety outcome was a composite of death, thromboembolic events, and procedural complications within 7 to 90 days. The exploratory effectiveness outcome was a composite of AF-related hospitalization events after a 90-day blanking period. Propensity score weighting was used to balance baseline covariates. Risk differences were estimated between catheter groups and averaged across the 2 health care systems, testing for noninferiority of the CF-I6 vs the CF-I56 catheter with respect to the safety outcome using 2-sided 90% CIs. Results: Overall, 1450 patients (1034 [71.3%] male; 1397 [96.3%] White) underwent catheter ablation for persistent AF, including 949 at Mercy Health (186 CF-I6 and 763 CF-I56; mean [SD] age, 64.9 [9.2] years) and 501 at Mayo Clinic (337 CF-I6 and 164 CF-I56; mean [SD] age, 63.7 [9.5] years). A total of 798 (55.0%) had been treated with class I or III antiarrhythmic drugs before ablation. The safety outcome (CF-I6 - CF-I56) was similar at both Mercy Health (1.3%; 90% CI, -2.1% to 4.6%) and Mayo Clinic (-3.8%; 90% CI, -11.4% to 3.7%); the mean difference was noninferior, with a mean of 0.5% (90% CI, -2.6% to 3.5%; P < .001). The effectiveness was similar at 12 months between the 2 catheter groups (mean risk difference, -1.8%; 90% CI, -7.3% to 3.7%). Conclusions and Relevance: In this cohort study, the CF-I6 catheter met the prespecified noninferiority safety criterion for persistent AF ablation compared with the CF-I56 catheter, and effectiveness was similar. This study demonstrates the ability of electronic health care system data to enable safety and effectiveness evaluations of medical devices.


Assuntos
Fibrilação Atrial , Ablação por Cateter , Idoso , Fibrilação Atrial/cirurgia , Ablação por Cateter/métodos , Catéteres , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Resultado do Tratamento
13.
J Am Med Inform Assoc ; 29(9): 1449-1460, 2022 08 16.
Artigo em Inglês | MEDLINE | ID: mdl-35799370

RESUMO

OBJECTIVES: To develop and validate a standards-based phenotyping tool to author electronic health record (EHR)-based phenotype definitions and demonstrate execution of the definitions against heterogeneous clinical research data platforms. MATERIALS AND METHODS: We developed an open-source, standards-compliant phenotyping tool known as the PhEMA Workbench that enables a phenotype representation using the Fast Healthcare Interoperability Resources (FHIR) and Clinical Quality Language (CQL) standards. We then demonstrated how this tool can be used to conduct EHR-based phenotyping, including phenotype authoring, execution, and validation. We validated the performance of the tool by executing a thrombotic event phenotype definition at 3 sites, Mayo Clinic (MC), Northwestern Medicine (NM), and Weill Cornell Medicine (WCM), and used manual review to determine precision and recall. RESULTS: An initial version of the PhEMA Workbench has been released, which supports phenotype authoring, execution, and publishing to a shared phenotype definition repository. The resulting thrombotic event phenotype definition consisted of 11 CQL statements, and 24 value sets containing a total of 834 codes. Technical validation showed satisfactory performance (both NM and MC had 100% precision and recall and WCM had a precision of 95% and a recall of 84%). CONCLUSIONS: We demonstrate that the PhEMA Workbench can facilitate EHR-driven phenotype definition, execution, and phenotype sharing in heterogeneous clinical research data environments. A phenotype definition that integrates with existing standards-compliant systems, and the use of a formal representation facilitates automation and can decrease potential for human error.


Assuntos
Registros Eletrônicos de Saúde , Poli-Hidroxietil Metacrilato , Humanos , Idioma , Fenótipo
14.
Artigo em Inglês | MEDLINE | ID: mdl-35853070

RESUMO

Identification of autism spectrum disorder (ASD) in children is challenging due to the complexity and heterogeneity of ASD. Currently, most existing methods mainly rely on a single modality with limited information and often cannot achieve satisfactory performance. To address this issue, this paper investigates from internal neurophysiological and external behavior perspectives simultaneously and proposes a new multimodal diagnosis framework for identifying ASD in children with fusion of electroencephalogram (EEG) and eye-tracking (ET) data. Specifically, we designed a two-step multimodal feature learning and fusion model based on a typical deep learning algorithm, stacked denoising autoencoder (SDAE). In the first step, two SDAE models are designed for feature learning for EEG and ET modality, respectively. Then, a third SDAE model in the second step is designed to perform multimodal fusion with learned EEG and ET features in a concatenated way. Our designed multimodal identification model can automatically capture correlations and complementarity from behavior modality and neurophysiological modality in a latent feature space, and generate informative feature representations with better discriminability and generalization for enhanced identification performance. We collected a multimodal dataset containing 40 ASD children and 50 typically developing (TD) children to evaluate our proposed method. Experimental results showed that our proposed method achieved superior performance compared with two unimodal methods and a simple feature-level fusion method, which has promising potential to provide an objective and accurate diagnosis to assist clinicians.


Assuntos
Transtorno do Espectro Autista , Algoritmos , Transtorno do Espectro Autista/diagnóstico , Criança , Eletroencefalografia , Humanos
15.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35649342

RESUMO

Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.


Assuntos
Benchmarking , Desenvolvimento de Medicamentos , Algoritmos , Avaliação Pré-Clínica de Medicamentos , Reposicionamento de Medicamentos/métodos , Proteínas/genética
17.
J Med Internet Res ; 24(7): e38584, 2022 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-35658098

RESUMO

BACKGROUND: Multiple types of biomedical associations of knowledge graphs, including COVID-19-related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. OBJECTIVE: Data quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model's performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. METHODS: The proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. RESULTS: The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. CONCLUSIONS: Our preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data.


Assuntos
COVID-19 , Humanos , Conhecimento , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão , Curva ROC
18.
Drug Saf ; 45(5): 459-476, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35579811

RESUMO

Monitoring adverse drug events or pharmacovigilance has been promoted by the World Health Organization to assure the safety of medicines through a timely and reliable information exchange regarding drug safety issues. We aim to discuss the application of machine learning methods as well as causal inference paradigms in pharmacovigilance. We first reviewed data sources for pharmacovigilance. Then, we examined traditional causal inference paradigms, their applications in pharmacovigilance, and how machine learning methods and causal inference paradigms were integrated to enhance the performance of traditional causal inference paradigms. Finally, we summarized issues with currently mainstream correlation-based machine learning models and how the machine learning community has tried to address these issues by incorporating causal inference paradigms. Our literature search revealed that most existing data sources and tasks for pharmacovigilance were not designed for causal inference. Additionally, pharmacovigilance was lagging in adopting machine learning-causal inference integrated models. We highlight several currently trending directions or gaps to integrate causal inference with machine learning in pharmacovigilance research. Finally, our literature search revealed that the adoption of causal paradigms can mitigate known issues with machine learning models. We foresee that the pharmacovigilance domain can benefit from the progress in the machine learning field.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Farmacovigilância , Causalidade , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Humanos , Aprendizado de Máquina , Modelos Teóricos
20.
J Biomed Inform ; 127: 104002, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35077901

RESUMO

OBJECTIVE: The large-scale collection of observational data and digital technologies could help curb the COVID-19 pandemic. However, the coexistence of multiple Common Data Models (CDMs) and the lack of data extract, transform, and load (ETL) tool between different CDMs causes potential interoperability issue between different data systems. The objective of this study is to design, develop, and evaluate an ETL tool that transforms the PCORnet CDM format data into the OMOP CDM. METHODS: We developed an open-source ETL tool to facilitate the data conversion from the PCORnet CDM and the OMOP CDM. The ETL tool was evaluated using a dataset with 1000 patients randomly selected from the PCORnet CDM at Mayo Clinic. Information loss, data mapping accuracy, and gap analysis approaches were conducted to assess the performance of the ETL tool. We designed an experiment to conduct a real-world COVID-19 surveillance task to assess the feasibility of the ETL tool. We also assessed the capacity of the ETL tool for the COVID-19 data surveillance using data collection criteria of the MN EHR Consortium COVID-19 project. RESULTS: After the ETL process, all the records of 1000 patients from 18 PCORnet CDM tables were successfully transformed into 12 OMOP CDM tables. The information loss for all the concept mapping was less than 0.61%. The string mapping process for the unit concepts lost 2.84% records. Almost all the fields in the manual mapping process achieved 0% information loss, except the specialty concept mapping. Moreover, the mapping accuracy for all the fields were 100%. The COVID-19 surveillance task collected almost the same set of cases (99.3% overlaps) from the original PCORnet CDM and target OMOP CDM separately. Finally, all the data elements for MN EHR Consortium COVID-19 project could be captured from both the PCORnet CDM and the OMOP CDM. CONCLUSION: We demonstrated that our ETL tool could satisfy the data conversion requirements between the PCORnet CDM and the OMOP CDM. The outcome of the work would facilitate the data retrieval, communication, sharing, and analysis between different institutions for not only COVID-19 related project, but also other real-world evidence-based observational studies.


Assuntos
COVID-19 , COVID-19/epidemiologia , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Pandemias , SARS-CoV-2
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...