Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 120
Filtrar
1.
Brief Bioinform ; 2022 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-35649342

RESUMO

Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.

3.
J Med Internet Res ; 2022 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-35658098

RESUMO

BACKGROUND: The multiple types of biomedical associations of the knowledge graphs, including the COVID-19-related ones, are constructed based upon the co-occurring biomedical entities retrieved from recent literature. However, the applications dervived from these raw graphs (e.g., association predictions amongst genes, drugs, and diseases) have a high probability of false-positive predictions as the co-occurrences in literature do not always mean a true biomedical association between two entities. OBJECTIVE: Data quality plays an important role in training deep neural network models, however, most of the current work in this area have been focused on improving a model's performance with the assumption that the pre-processed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. METHODS: The proposed framework utilized generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two Generative Adversarial Network models, NetGAN and CELL, were adopted for the edge classification (i.e., link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. RESULTS: The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the promised method still achieved favorable results (AUCROC > 0.8 for synthetic and 0.7 for real dataset) despite the limited amount of testing data available. CONCLUSIONS: Our preliminary findings showed the proposed framework achieved promising results for removing noise in data preprocessing of the biomedical knowledge graph potentially improving the performance of downstream applications by providing cleaner data.

4.
Drug Saf ; 45(5): 459-476, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35579811

RESUMO

Monitoring adverse drug events or pharmacovigilance has been promoted by the World Health Organization to assure the safety of medicines through a timely and reliable information exchange regarding drug safety issues. We aim to discuss the application of machine learning methods as well as causal inference paradigms in pharmacovigilance. We first reviewed data sources for pharmacovigilance. Then, we examined traditional causal inference paradigms, their applications in pharmacovigilance, and how machine learning methods and causal inference paradigms were integrated to enhance the performance of traditional causal inference paradigms. Finally, we summarized issues with currently mainstream correlation-based machine learning models and how the machine learning community has tried to address these issues by incorporating causal inference paradigms. Our literature search revealed that most existing data sources and tasks for pharmacovigilance were not designed for causal inference. Additionally, pharmacovigilance was lagging in adopting machine learning-causal inference integrated models. We highlight several currently trending directions or gaps to integrate causal inference with machine learning in pharmacovigilance research. Finally, our literature search revealed that the adoption of causal paradigms can mitigate known issues with machine learning models. We foresee that the pharmacovigilance domain can benefit from the progress in the machine learning field.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Farmacovigilância , Causalidade , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Humanos , Aprendizado de Máquina , Modelos Teóricos
6.
J Biomed Inform ; 127: 104002, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35077901

RESUMO

OBJECTIVE: The large-scale collection of observational data and digital technologies could help curb the COVID-19 pandemic. However, the coexistence of multiple Common Data Models (CDMs) and the lack of data extract, transform, and load (ETL) tool between different CDMs causes potential interoperability issue between different data systems. The objective of this study is to design, develop, and evaluate an ETL tool that transforms the PCORnet CDM format data into the OMOP CDM. METHODS: We developed an open-source ETL tool to facilitate the data conversion from the PCORnet CDM and the OMOP CDM. The ETL tool was evaluated using a dataset with 1000 patients randomly selected from the PCORnet CDM at Mayo Clinic. Information loss, data mapping accuracy, and gap analysis approaches were conducted to assess the performance of the ETL tool. We designed an experiment to conduct a real-world COVID-19 surveillance task to assess the feasibility of the ETL tool. We also assessed the capacity of the ETL tool for the COVID-19 data surveillance using data collection criteria of the MN EHR Consortium COVID-19 project. RESULTS: After the ETL process, all the records of 1000 patients from 18 PCORnet CDM tables were successfully transformed into 12 OMOP CDM tables. The information loss for all the concept mapping was less than 0.61%. The string mapping process for the unit concepts lost 2.84% records. Almost all the fields in the manual mapping process achieved 0% information loss, except the specialty concept mapping. Moreover, the mapping accuracy for all the fields were 100%. The COVID-19 surveillance task collected almost the same set of cases (99.3% overlaps) from the original PCORnet CDM and target OMOP CDM separately. Finally, all the data elements for MN EHR Consortium COVID-19 project could be captured from both the PCORnet CDM and the OMOP CDM. CONCLUSION: We demonstrated that our ETL tool could satisfy the data conversion requirements between the PCORnet CDM and the OMOP CDM. The outcome of the work would facilitate the data retrieval, communication, sharing, and analysis between different institutions for not only COVID-19 related project, but also other real-world evidence-based observational studies.


Assuntos
COVID-19 , COVID-19/epidemiologia , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Pandemias , SARS-CoV-2
7.
J Am Med Inform Assoc ; 28(11): 2313-2324, 2021 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-34505903

RESUMO

OBJECTIVE: The study sought to test the feasibility of conducting a phenome-wide association study to characterize phenotypic abnormalities associated with individuals at high risk for lung cancer using electronic health records. MATERIALS AND METHODS: We used the beta release of the All of Us Researcher Workbench with clinical and survey data from a population of 225 000 subjects. We identified 3 cohorts of individuals at high risk to develop lung cancer based on (1) the 2013 U.S. Preventive Services Task Force criteria, (2) the long-term quitters of cigarette smoking criteria, and (3) the younger age of onset criteria. We applied the logistic regression analysis to identify the significant associations between individuals' phenotypes and their risk categories. We validated our findings against a lung cancer cohort from the same population and conducted an expert review to understand whether these associations are known or potentially novel. RESULTS: We found a total of 214 statistically significant associations (P < .05 with a Bonferroni correction and odds ratio > 1.5) enriched in the high-risk individuals from 3 cohorts, and 15 enriched in the low-risk individuals. Forty significant associations enriched in the high-risk individuals and 13 enriched in the low-risk individuals were validated in the cancer cohort. Expert review identified 15 potentially new associations enriched in the high-risk individuals. CONCLUSIONS: It is feasible to conduct a phenome-wide association study to characterize phenotypic abnormalities associated in high-risk individuals developing lung cancer using electronic health records. The All of Us Research Workbench is a promising resource for the research studies to evaluate and optimize lung cancer screening criteria.


Assuntos
Neoplasias Pulmonares , Saúde da População , Detecção Precoce de Câncer , Registros Eletrônicos de Saúde , Estudo de Associação Genômica Ampla , Humanos , Neoplasias Pulmonares/epidemiologia , Fenótipo
8.
AMIA Jt Summits Transl Sci Proc ; 2021: 410-419, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34457156

RESUMO

HL7 Fast Healthcare Interoperability Resources (FHIR) is one of the current data standards for enabling electronic healthcare information exchange. Previous studies have shown that FHIR is capable of modeling both structured and unstructured data from electronic health records (EHRs). However, the capability of FHIR in enabling clinical data analytics has not been well investigated. The objective of the study is to demonstrate how FHIR-based representation of unstructured EHR data can be ported to deep learning models for text classification in clinical phenotyping. We leverage and extend the NLP2FHIR clinical data normalization pipeline and conduct a case study with two obesity datasets. We tested several deep learning-based text classifiers such as convolutional neural networks, gated recurrent unit, and text graph convolutional networks on both raw text and NLP2FHIR inputs. We found that the combination of NLP2FHIR input and text graph convolutional networks has the highest F1 score. Therefore, FHIR-based deep learning methods has the potential to be leveraged in supporting EHR phenotyping, making the phenotyping algorithms more portable across EHR systems and institutions.


Assuntos
Aprendizado Profundo , Algoritmos , Registros Eletrônicos de Saúde , Humanos , Obesidade , Projetos Piloto
9.
AMIA Jt Summits Transl Sci Proc ; 2021: 624-633, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34457178

RESUMO

Lack of standardized representation of natural language processing (NLP) components in phenotyping algorithms hinders portability of the phenotyping algorithms and their execution in a high-throughput and reproducible manner. The objective of the study is to develop and evaluate a standard-driven approach - CQL4NLP - that integrates a collection of NLP extensions represented in the HL7 Fast Healthcare Interoperability Resources (FHIR) standard into the clinical quality language (CQL). A minimal NLP data model with 11 NLP-specific data elements was created, including six FHIR NLP extensions. All 11 data elements were identified from their usage in real-world phenotyping algorithms. An NLP ruleset generation mechanism was integrated into the NLP2FHIR pipeline and the NLP rulesets enabled comparable performance for a case study with the identification of obesity comorbidities. The NLP ruleset generation mechanism created a reproducible process for defining the NLP components of a phenotyping algorithm and its execution.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Algoritmos , Comorbidade , Humanos , Idioma
10.
J Am Med Inform Assoc ; 28(10): 2241-2250, 2021 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-34313748

RESUMO

OBJECTIVE: The study sought to conduct an informatics analysis on the National Evaluation System for Health Technology Coordinating Center test case of cardiac ablation catheters and to demonstrate the role of informatics approaches in the feasibility assessment of capturing real-world data using unique device identifiers (UDIs) that are fit for purpose for label extensions for 2 cardiac ablation catheters from the electronic health records and other health information technology systems in a multicenter evaluation. MATERIALS AND METHODS: We focused on data capture and transformation and data quality maturity model specified in the National Evaluation System for Health Technology Coordinating Center data quality framework. The informatics analysis included 4 elements: the use of UDIs for identifying device exposure data, the use of standardized codes for defining computable phenotypes, the use of natural language processing for capturing unstructured data elements from clinical data systems, and the use of common data models for standardizing data collection and analyses. RESULTS: We found that, with the UDI implementation at 3 health systems, the target device exposure data could be effectively identified, particularly for brand-specific devices. Computable phenotypes for study outcomes could be defined using codes; however, ablation registries, natural language processing tools, and chart reviews were required for validating data quality of the phenotypes. The common data model implementation status varied across sites. The maturity level of the key informatics technologies was highly aligned with the data quality maturity model. CONCLUSIONS: We demonstrated that the informatics approaches can be feasibly used to capture safety and effectiveness outcomes in real-world data for use in medical device studies supporting label extensions.


Assuntos
Registros Eletrônicos de Saúde , Sistemas de Informação em Saúde , Estudos de Viabilidade , Informática , Processamento de Linguagem Natural
11.
JMIR Med Inform ; 9(5): e23586, 2021 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-34032581

RESUMO

BACKGROUND: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions. OBJECTIVE: This study presents a methodology to harmonize phenotypic and genetic data features to classify primary cancer types and predict cancers of unknown primaries. METHODS: We extracted genetic data elements from oncology genetic reports of 1011 patients with cancer and their corresponding phenotypical data from Mayo Clinic's electronic health records. We modeled both genetic and electronic health record data with HL7 Fast Healthcare Interoperability Resources. The semantic web Resource Description Framework was employed to generate the network-based data representation (ie, patient-phenotypic-genetic network). Based on the Resource Description Framework data graph, Node2vec graph-embedding algorithm was applied to generate features. Multiple machine learning and deep learning backbone models were compared for cancer prediction performance. RESULTS: With 6 machine learning tasks designed in the experiment, we demonstrated the proposed method achieved favorable results in classifying primary cancer types (area under the receiver operating characteristic curve [AUROC] 96.56% for all 9 cancer predictions on average based on the cross-validation) and predicting unknown primaries (AUROC 80.77% for all 8 cancer predictions on average for real-patient validation). To demonstrate the interpretability, 17 phenotypic and genetic features that contributed the most to the prediction of each cancer were identified and validated based on a literature review. CONCLUSIONS: Accurate prediction of cancer types can be achieved with existing electronic health record data with satisfactory precision. The integration of genetic reports improves prediction, illustrating the translational values of incorporating genetic tests early at the diagnosis stage for patients with cancer.

12.
J Biomed Inform ; 117: 103755, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33781919

RESUMO

Resource Description Framework (RDF) is one of the three standardized data formats in the HL7 Fast Healthcare Interoperability Resources (FHIR) specification and is being used by healthcare and research organizations to join FHIR and non-FHIR data. However, RDF previously had not been integrated into popular FHIR tooling packages, hindering the adoption of FHIR RDF in the semantic web and other communities. The objective of the study is to develop and evaluate a Java based FHIR RDF data transformation toolkit to facilitate the use and validation of FHIR RDF data. We extended the popular HAPI FHIR tooling to add RDF support, thus enabling FHIR data in XML or JSON to be transformed to or from RDF. We also developed an RDF Shape Expression (ShEx)-based validation framework to verify conformance of FHIR RDF data to the ShEx schemas provided in the FHIR specification for FHIR versions R4 and R5. The effectiveness of ShEx validation was demonstrated by testing it against 2693 FHIR R4 examples and 2197 FHIR R5 examples that are included in the FHIR specification. A total of 5 types of errors including missing properties, unknown element, missing resource Type, invalid attribute value, and unknown resource name in the R5 examples were revealed, demonstrating the value of the ShEx in the quality assurance of the evolving R5 development. This FHIR RDF data transformation and validation framework, based on HAPI and ShEx, is robust and ready for community use in adopting FHIR RDF, improving FHIR data quality, and evolving the FHIR specification.


Assuntos
Atenção à Saúde , Registros Eletrônicos de Saúde
13.
Int J Med Inform ; 145: 104308, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33160272

RESUMO

BACKGROUND AND OBJECTIVE: Identification and Standardization of data elements used in clinical trials may control and reduce the cost and errors during the operational process, and enable seamless data exchange between the electronic data capture (EDC) systems and Electronic Health Record (EHR) systems. This study presents a methodology to comprehensively capture the clinical trial data element needs. MATERIALS AND METHODS: Case report forms (CRF) for clinical trial data collection were used to approximate the clinical information need, whereby these information needs were then mapped to a semantically equivalent field within an existing FHIR cancer profile. For items without a semantically equivalent field, we considered these items to be information needs that cannot be represented in current standards and proposed extensions to support these needs. RESULTS: We successfully identified 62 discrete items from a preliminary survey of 43 base questions in four CRFs used in colorectal cancer clinical trials, in which 28 items are modeled with FHIR extensions and their associated responses for colorectal cancer. We achieved promising results in the data population of the CRFs with average Precision 98.5 %, Recall 96.2 %, and F-measure 96.8 % for all base questions. We also demonstrated the auto-filled answers in CRFs can be used to discover patient subgroups using a topic modeling approach. CONCLUSION: CRFs can be considered as a proxy for representing information needs for their respective cancer types. Mining the information needs can serve as a valuable resource for expanding existing standards to ensure they can comprehensively represent relevant clinical data without loss of granularity.


Assuntos
Neoplasias Colorretais , Registros Eletrônicos de Saúde , Ensaios Clínicos como Assunto , Neoplasias Colorretais/terapia , Humanos , Inquéritos e Questionários
14.
BMJ Surg Interv Health Technol ; 3(1): e000089, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35047806

RESUMO

OBJECTIVES: To determine the feasibility of using real-world data to assess the safety and effectiveness of two cardiac ablation catheters for the treatment of persistent atrial fibrillation and ischaemic ventricular tachycardia. DESIGN: Retrospective cohort. SETTING: Three health systems in the USA. PARTICIPANTS: Patients receiving ablation with the two ablation catheters of interest at any of the three health systems. MAIN OUTCOME MEASURES: Feasibility of identifying the medical devices and participant populations of interest as well as the duration of follow-up and positive predictive values (PPVs) for serious safety (ischaemic stroke, acute heart failure and cardiac tamponade) and effectiveness (arrhythmia-related hospitalisation) clinical outcomes of interest compared with manual chart validation by clinicians. RESULTS: Overall, the catheter of interest for treatment of persistent atrial fibrillation was used for 4280 ablations and the catheter of interest for ischaemic ventricular tachycardia was used 1516 times across the data available within the three health systems. The duration of patient follow-up in the three health systems ranged from 91% to 97% at ≥7 days, 89% to 96% at ≥30 days, 77% to 90% at ≥6 months and 66% to 84% at ≥1 year. PPVs were 63.4% for ischaemic stroke, 96.4% for acute heart failure, 100% at one health system for cardiac tamponade and 55.7% for arrhythmia-related hospitalisation. CONCLUSIONS: It is feasible to use real-world health system data to evaluate the safety and effectiveness of cardiac ablation catheters, though evaluations must consider the implications of variation in follow-up and endpoint ascertainment among health systems.

15.
Learn Health Syst ; 4(4): e10233, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33083538

RESUMO

INTRODUCTION: Electronic health record (EHR)-driven phenotyping is a critical first step in generating biomedical knowledge from EHR data. Despite recent progress, current phenotyping approaches are manual, time-consuming, error-prone, and platform-specific. This results in duplication of effort and highly variable results across systems and institutions, and is not scalable or portable. In this work, we investigate how the nascent Clinical Quality Language (CQL) can address these issues and enable high-throughput, cross-platform phenotyping. METHODS: We selected a clinically validated heart failure (HF) phenotype definition and translated it into CQL, then developed a CQL execution engine to integrate with the Observational Health Data Sciences and Informatics (OHDSI) platform. We executed the phenotype definition at two large academic medical centers, Northwestern Medicine and Weill Cornell Medicine, and conducted results verification (n = 100) to determine precision and recall. We additionally executed the same phenotype definition against two different data platforms, OHDSI and Fast Healthcare Interoperability Resources (FHIR), using the same underlying dataset and compared the results. RESULTS: CQL is expressive enough to represent the HF phenotype definition, including Boolean and aggregate operators, and temporal relationships between data elements. The language design also enabled the implementation of a custom execution engine with relative ease, and results verification at both sites revealed that precision and recall were both 100%. Cross-platform execution resulted in identical patient cohorts generated by both data platforms. CONCLUSIONS: CQL supports the representation of arbitrarily complex phenotype definitions, and our execution engine implementation demonstrated cross-platform execution against two widely used clinical data platforms. The language thus has the potential to help address current limitations with portability in EHR-driven phenotyping and scale in learning health systems.

16.
Learn Health Syst ; 4(4): e10241, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33083540

RESUMO

OBJECTIVE: To identify depression subphenotypes from Electronic Health Records (EHRs) using machine learning methods, and analyze their characteristics with respect to patient demographics, comorbidities, and medications. MATERIALS AND METHODS: Using EHRs from the INSIGHT Clinical Research Network (CRN) database, multiple machine learning (ML) algorithms were applied to analyze 11 275 patients with depression to discern depression subphenotypes with distinct characteristics. RESULTS: Using the computational approaches, we derived three depression subphenotypes: Phenotype_A (n = 2791; 31.35%) included patients who were the oldest (mean (SD) age, 72.55 (14.93) years), had the most comorbidities, and took the most medications. The most common comorbidities in this cluster of patients were hyperlipidemia, hypertension, and diabetes. Phenotype_B (mean (SD) age, 68.44 (19.09) years) was the largest cluster (n = 4687; 52.65%), and included patients suffering from moderate loss of body function. Asthma, fibromyalgia, and Chronic Pain and Fatigue (CPF) were common comorbidities in this subphenotype. Phenotype_C (n = 1452; 16.31%) included patients who were younger (mean (SD) age, 63.47 (18.81) years), had the fewest comorbidities, and took fewer medications. Anxiety and tobacco use were common comorbidities in this subphenotype. CONCLUSION: Computationally deriving depression subtypes can provide meaningful insights and improve understanding of depression as a heterogeneous disorder. Further investigation is needed to assess the utility of these derived phenotypes to inform clinical trial design and interpretation in routine patient care.

17.
Learn Health Syst ; 4(4): e10246, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33083543

RESUMO

INTRODUCTION: We sought to assess longitudinal electronic health records (EHRs) using machine learning (ML) methods to computationally derive probable Alzheimer's Disease (AD) and related dementia subphenotypes. METHODS: A retrospective analysis of EHR data from a cohort of 7587 patients seen at a large, multi-specialty urban academic medical center in New York was conducted. Subphenotypes were derived using hierarchical clustering from 792 probable AD patients (cases) who had received at least one diagnosis of AD using their clinical data. The other 6795 patients, labeled as controls, were matched on age and gender with the cases and randomly selected in the ratio of 9:1. Prediction models with multiple ML algorithms were trained on this cohort using 5-fold cross-validation. XGBoost was used to rank the variable importance. RESULTS: Four subphenotypes were computationally derived. Subphenotype A (n = 273; 28.2%) had more patients with cardiovascular diseases; subphenotype B (n = 221; 27.9%) had more patients with mental health illnesses, such as depression and anxiety; patients in subphenotype C (n = 183; 23.1%) were overall older (mean (SD) age, 79.5 (5.4) years) and had the most comorbidities including diabetes, cardiovascular diseases, and mental health disorders; and subphenotype D (n = 115; 14.5%) included patients who took anti-dementia drugs and had sensory problems, such as deafness and hearing impairment.The 0-year prediction model for AD risk achieved an area under the receiver operating curve (AUC) of 0.764 (SD: 0.02); the 6-month model, 0.751 (SD: 0.02); the 1-year model, 0.752 (SD: 0.02); the 2-year model, 0.749 (SD: 0.03); and the 3-year model, 0.735 (SD: 0.03), respectively. Based on variable importance, the top-ranked comorbidities included depression, stroke/transient ischemic attack, hypertension, anxiety, mobility impairments, and atrial fibrillation. The top-ranked medications included anti-dementia drugs, antipsychotics, antiepileptics, and antidepressants. CONCLUSIONS: Four subphenotypes were computationally derived that correlated with cardiovascular diseases and mental health illnesses. ML algorithms based on patient demographics, diagnosis, and treatment demonstrated promising results in predicting the risk of developing AD at different time points across an individual's lifespan.

18.
J Biomed Inform ; 110: 103541, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32814201

RESUMO

Free-text problem descriptions are brief explanations of patient diagnoses and issues, commonly found in problem lists and other prominent areas of the medical record. These compact representations often express complex and nuanced medical conditions, making their semantics challenging to fully capture and standardize. In this study, we describe a framework for transforming free-text problem descriptions into standardized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) models. This approach leverages a combination of domain-specific dependency parsers, Bidirectional Encoder Representations from Transformers (BERT) natural language models, and cui2vec Unified Medical Language System (UMLS) concept vectors to align extracted concepts from free-text problem descriptions into structured FHIR models. A neural network classification model is used to classify thirteen relationship types between concepts, facilitating mapping to the FHIR Condition resource. We use data programming, a weak supervision approach, to eliminate the need for a manually annotated training corpus. Shapley values, a mechanism to quantify contribution, are used to interpret the impact of model features. We found that our methods identified the focus concept, or primary clinical concern of the problem description, with an F1 score of 0.95. Relationships from the focus to other modifying concepts were extracted with an F1 score of 0.90. When classifying relationships, our model achieved a 0.89 weighted average F1 score, enabling accurate mapping of attributes into HL7 FHIR models. We also found that the BERT input representation predominantly contributed to the classifier decision as shown by the Shapley values analysis.


Assuntos
Registros Eletrônicos de Saúde , Nível Sete de Saúde , Humanos , Padrões de Referência , Software , Unified Medical Language System
19.
Sensors (Basel) ; 20(12)2020 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-32599907

RESUMO

Sensor fault detection of wind turbines plays an important role in improving the reliability and stable operation of turbines. The supervisory control and data acquisition (SCADA) system of a wind turbine provides promising insights into sensor fault detection due to the accessibility of the data and the abundance of sensor information. However, SCADA data are essentially multivariate time series with inherent spatio-temporal correlation characteristics, which has not been well considered in the existing wind turbine fault detection research. This paper proposes a novel classification-based fault detection method for wind turbine sensors. To better capture the spatio-temporal characteristics hidden in SCADA data, a multiscale spatio-temporal convolutional deep belief network (MSTCDBN) was developed to perform feature learning and classification to fulfill the sensor fault detection. A major superiority of the proposed method is that it can not only learn the spatial correlation information between several different variables but also capture the temporal characteristics of each variable. Furthermore, this method with multiscale learning capability can excavate interactive characteristics between variables at different scales of filters. A generic wind turbine benchmark model was used to evaluate the proposed approach. The comparative results demonstrate that the proposed method can significantly enhance the fault detection performance.

20.
AMIA Jt Summits Transl Sci Proc ; 2020: 517-526, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32477673

RESUMO

While using data standards can facilitate research by making it easier to share data, manually mapping to data standards creates an obstacle to their adoption. Semi-automated mapping strategies can reduce the manual mapping burden. Machine learning approaches, such as artificial neural networks, can predict mappings between clinical data standards but are limited by the need for training data. We developed a graph database that incorporates the Biomedical Research Integrated Domain Group (BRIDG) model, Common Data Elements (CDEs) from the National Cancer Institute's (NCI) cancer Data Standards Registry and Repository, and the NCI Thesaurus. We then used a shortest path algorithm to predict mappings from CDEs to classes in the BRIDG model. The resulting graph database provides a robust semantic framework for analysis and quality assurance testing. Using the graph database to predict CDE to BRIDG class mappings was limited by the subjective nature of mapping and data quality issues.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...