Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 134
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 23(4)2022 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-35649342

RESUMEN

Internal validation is the most popular evaluation strategy used for drug-target predictive models. The simple random shuffling in the cross-validation, however, is not always ideal to handle large, diverse and copious datasets as it could potentially introduce bias. Hence, these predictive models cannot be comprehensively evaluated to provide insight into their general performance on a variety of use-cases (e.g. permutations of different levels of connectiveness and categories in drug and target space, as well as validations based on different data sources). In this work, we introduce a benchmark, BETA, that aims to address this gap by (i) providing an extensive multipartite network consisting of 0.97 million biomedical concepts and 8.5 million associations, in addition to 62 million drug-drug and protein-protein similarities and (ii) presenting evaluation strategies that reflect seven cases (i.e. general, screening with different connectivity, target and drug screening based on categories, searching for specific drugs and targets and drug repurposing for specific diseases), a total of seven Tests (consisting of 344 Tasks in total) across multiple sampling and validation strategies. Six state-of-the-art methods covering two broad input data types (chemical structure- and gene sequence-based and network-based) were tested across all the developed Tasks. The best-worst performing cases have been analyzed to demonstrate the ability of the proposed benchmark to identify limitations of the tested methods for running over the benchmark tasks. The results highlight BETA as a benchmark in the selection of computational strategies for drug repurposing and target discovery.


Asunto(s)
Benchmarking , Desarrollo de Medicamentos , Algoritmos , Evaluación Preclínica de Medicamentos , Reposicionamiento de Medicamentos/métodos , Proteínas/genética
2.
J Biomed Inform ; 148: 104534, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37918622

RESUMEN

This work continues along a visionary path of using Semantic Web standards such as RDF and ShEx to make healthcare data easier to integrate for research and leading-edge patient care. The work extends the ability to use ShEx schemas to validate FHIR RDF data, thereby enhancing the semantic web ecosystem for working with FHIR and non-FHIR data using the same ShEx validation framework. It updates FHIR's ShEx schemas to fix outstanding issues and reflect changes in the definition of FHIR RDF. In addition, it experiments with expressing FHIRPath constraints (which are not captured in the XML or JSON schemas) in ShEx schemas. These extended ShEx schemas were incorporated into the FHIR R5 specification and used to successfully validate FHIR R5 examples that are included with the FHIR specification, revealing several errors in the examples.


Asunto(s)
Ecosistema , Registros Electrónicos de Salud , Humanos , Atención a la Salud
3.
J Biomed Inform ; 144: 104442, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37429512

RESUMEN

OBJECTIVE: We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). METHODS: We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000 and 2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting into sections, and then pre-trained a BERT model for AD (named AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. All sections of a patient were embedded into a vector representation by AD-BERT and then combined by global MaxPooling and a fully connected network to compute the probability of MCI-to-AD progression. For validation, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. RESULTS: Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.849 and F1 score of 0.440 on NMEDW dataset, and AUC of 0.883 and F1 score of 0.680 on WCM dataset. CONCLUSION: The use of EHRs for AD-related research is promising, and AD-BERT shows superior predictive performance in modeling MCI-to-AD progression prediction. Our study demonstrates the utility of pre-trained language models and clinical notes in predicting MCI-to-AD progression, which could have important implications for improving early detection and intervention for AD.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Humanos , Enfermedad de Alzheimer/diagnóstico , Disfunción Cognitiva/diagnóstico , Progresión de la Enfermedad
4.
J Biomed Inform ; 127: 104002, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35077901

RESUMEN

OBJECTIVE: The large-scale collection of observational data and digital technologies could help curb the COVID-19 pandemic. However, the coexistence of multiple Common Data Models (CDMs) and the lack of data extract, transform, and load (ETL) tool between different CDMs causes potential interoperability issue between different data systems. The objective of this study is to design, develop, and evaluate an ETL tool that transforms the PCORnet CDM format data into the OMOP CDM. METHODS: We developed an open-source ETL tool to facilitate the data conversion from the PCORnet CDM and the OMOP CDM. The ETL tool was evaluated using a dataset with 1000 patients randomly selected from the PCORnet CDM at Mayo Clinic. Information loss, data mapping accuracy, and gap analysis approaches were conducted to assess the performance of the ETL tool. We designed an experiment to conduct a real-world COVID-19 surveillance task to assess the feasibility of the ETL tool. We also assessed the capacity of the ETL tool for the COVID-19 data surveillance using data collection criteria of the MN EHR Consortium COVID-19 project. RESULTS: After the ETL process, all the records of 1000 patients from 18 PCORnet CDM tables were successfully transformed into 12 OMOP CDM tables. The information loss for all the concept mapping was less than 0.61%. The string mapping process for the unit concepts lost 2.84% records. Almost all the fields in the manual mapping process achieved 0% information loss, except the specialty concept mapping. Moreover, the mapping accuracy for all the fields were 100%. The COVID-19 surveillance task collected almost the same set of cases (99.3% overlaps) from the original PCORnet CDM and target OMOP CDM separately. Finally, all the data elements for MN EHR Consortium COVID-19 project could be captured from both the PCORnet CDM and the OMOP CDM. CONCLUSION: We demonstrated that our ETL tool could satisfy the data conversion requirements between the PCORnet CDM and the OMOP CDM. The outcome of the work would facilitate the data retrieval, communication, sharing, and analysis between different institutions for not only COVID-19 related project, but also other real-world evidence-based observational studies.


Asunto(s)
COVID-19 , COVID-19/epidemiología , Bases de Datos Factuales , Registros Electrónicos de Salud , Humanos , Almacenamiento y Recuperación de la Información , Pandemias , SARS-CoV-2
5.
J Biomed Inform ; 134: 104201, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36089199

RESUMEN

BACKGROUND: Knowledge graphs (KGs) play a key role to enable explainable artificial intelligence (AI) applications in healthcare. Constructing clinical knowledge graphs (CKGs) against heterogeneous electronic health records (EHRs) has been desired by the research and healthcare AI communities. From the standardization perspective, community-based standards such as the Fast Healthcare Interoperability Resources (FHIR) and the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) are increasingly used to represent and standardize EHR data for clinical data analytics, however, the potential of such a standard on building CKG has not been well investigated. OBJECTIVE: To develop and evaluate methods and tools that expose the OMOP CDM-based clinical data repositories into virtual clinical KGs that are compliant with FHIR Resource Description Framework (RDF) specification. METHODS: We developed a system called FHIR-Ontop-OMOP to generate virtual clinical KGs from the OMOP relational databases. We leveraged an OMOP CDM-based Medical Information Mart for Intensive Care (MIMIC-III) data repository to evaluate the FHIR-Ontop-OMOP system in terms of the faithfulness of data transformation and the conformance of the generated CKGs to the FHIR RDF specification. RESULTS: A beta version of the system has been released. A total of more than 100 data element mappings from 11 OMOP CDM clinical data, health system and vocabulary tables were implemented in the system, covering 11 FHIR resources. The generated virtual CKG from MIMIC-III contains 46,520 instances of FHIR Patient, 716,595 instances of Condition, 1,063,525 instances of Procedure, 24,934,751 instances of MedicationStatement, 365,181,104 instances of Observations, and 4,779,672 instances of CodeableConcept. Patient counts identified by five pairs of SQL (over the MIMIC database) and SPARQL (over the virtual CKG) queries were identical, ensuring the faithfulness of the data transformation. Generated CKG in RDF triples for 100 patients were fully conformant with the FHIR RDF specification. CONCLUSION: The FHIR-Ontop-OMOP system can expose OMOP database as a FHIR-compliant RDF graph. It provides a meaningful use case demonstrating the potentials that can be enabled by the interoperability between FHIR and OMOP CDM. Generated clinical KGs in FHIR RDF provide a semantic foundation to enable explainable AI applications in healthcare.


Asunto(s)
Inteligencia Artificial , Reconocimiento de Normas Patrones Automatizadas , Data Warehousing , Atención a la Salud , Registros Electrónicos de Salud , Humanos
6.
J Med Internet Res ; 24(7): e38584, 2022 07 06.
Artículo en Inglés | MEDLINE | ID: mdl-35658098

RESUMEN

BACKGROUND: Multiple types of biomedical associations of knowledge graphs, including COVID-19-related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. OBJECTIVE: Data quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model's performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. METHODS: The proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. RESULTS: The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. CONCLUSIONS: Our preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data.


Asunto(s)
COVID-19 , Humanos , Conocimiento , Redes Neurales de la Computación , Reconocimiento de Normas Patrones Automatizadas , Curva ROC
7.
J Biomed Inform ; 117: 103755, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33781919

RESUMEN

Resource Description Framework (RDF) is one of the three standardized data formats in the HL7 Fast Healthcare Interoperability Resources (FHIR) specification and is being used by healthcare and research organizations to join FHIR and non-FHIR data. However, RDF previously had not been integrated into popular FHIR tooling packages, hindering the adoption of FHIR RDF in the semantic web and other communities. The objective of the study is to develop and evaluate a Java based FHIR RDF data transformation toolkit to facilitate the use and validation of FHIR RDF data. We extended the popular HAPI FHIR tooling to add RDF support, thus enabling FHIR data in XML or JSON to be transformed to or from RDF. We also developed an RDF Shape Expression (ShEx)-based validation framework to verify conformance of FHIR RDF data to the ShEx schemas provided in the FHIR specification for FHIR versions R4 and R5. The effectiveness of ShEx validation was demonstrated by testing it against 2693 FHIR R4 examples and 2197 FHIR R5 examples that are included in the FHIR specification. A total of 5 types of errors including missing properties, unknown element, missing resource Type, invalid attribute value, and unknown resource name in the R5 examples were revealed, demonstrating the value of the ShEx in the quality assurance of the evolving R5 development. This FHIR RDF data transformation and validation framework, based on HAPI and ShEx, is robust and ready for community use in adopting FHIR RDF, improving FHIR data quality, and evolving the FHIR specification.


Asunto(s)
Atención a la Salud , Registros Electrónicos de Salud
8.
J Biomed Inform ; 110: 103541, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32814201

RESUMEN

Free-text problem descriptions are brief explanations of patient diagnoses and issues, commonly found in problem lists and other prominent areas of the medical record. These compact representations often express complex and nuanced medical conditions, making their semantics challenging to fully capture and standardize. In this study, we describe a framework for transforming free-text problem descriptions into standardized Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) models. This approach leverages a combination of domain-specific dependency parsers, Bidirectional Encoder Representations from Transformers (BERT) natural language models, and cui2vec Unified Medical Language System (UMLS) concept vectors to align extracted concepts from free-text problem descriptions into structured FHIR models. A neural network classification model is used to classify thirteen relationship types between concepts, facilitating mapping to the FHIR Condition resource. We use data programming, a weak supervision approach, to eliminate the need for a manually annotated training corpus. Shapley values, a mechanism to quantify contribution, are used to interpret the impact of model features. We found that our methods identified the focus concept, or primary clinical concern of the problem description, with an F1 score of 0.95. Relationships from the focus to other modifying concepts were extracted with an F1 score of 0.90. When classifying relationships, our model achieved a 0.89 weighted average F1 score, enabling accurate mapping of attributes into HL7 FHIR models. We also found that the BERT input representation predominantly contributed to the classifier decision as shown by the Shapley values analysis.


Asunto(s)
Registros Electrónicos de Salud , Estándar HL7 , Humanos , Estándares de Referencia , Programas Informáticos , Unified Medical Language System
9.
J Biomed Inform ; 102: 103361, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31911172

RESUMEN

Acute Kidney Injury (AKI) is a common clinical syndrome characterized by the rapid loss of kidney excretory function, which aggravates the clinical severity of other diseases in a large number of hospitalized patients. Accurate early prediction of AKI can enable in-time interventions and treatments. However, AKI is highly heterogeneous, thus identification of AKI sub-phenotypes can lead to an improved understanding of the disease pathophysiology and development of more targeted clinical interventions. This study used a memory network-based deep learning approach to discover AKI sub-phenotypes using structured and unstructured electronic health record (EHR) data of patients before AKI diagnosis. We leveraged a real world critical care EHR corpus including 37,486 ICU stays. Our approach identified three distinct sub-phenotypes: sub-phenotype I is with an average age of 63.03±17.25 years, and is characterized by mild loss of kidney excretory function (Serum Creatinine (SCr) 1.55±0.34 mg/dL, estimated Glomerular Filtration Rate Test (eGFR) 107.65±54.98 mL/min/1.73 m2). These patients are more likely to develop stage I AKI. Sub-phenotype II is with average age 66.81±10.43 years, and was characterized by severe loss of kidney excretory function (SCr 1.96±0.49 mg/dL, eGFR 82.19±55.92 mL/min/1.73 m2). These patients are more likely to develop stage III AKI. Sub-phenotype III is with average age 65.07±11.32 years, and was characterized moderate loss of kidney excretory function and thus more likely to develop stage II AKI (SCr 1.69±0.32 mg/dL, eGFR 93.97±56.53 mL/min/1.73 m2). Both SCr and eGFR are significantly different across the three sub-phenotypes with statistical testing plus postdoc analysis, and the conclusion still holds after age adjustment.


Asunto(s)
Lesión Renal Aguda , Registros Electrónicos de Salud , Lesión Renal Aguda/diagnóstico , Anciano , Creatinina , Tasa de Filtración Glomerular , Humanos , Persona de Mediana Edad , Fenotipo
10.
Sensors (Basel) ; 20(12)2020 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-32599907

RESUMEN

Sensor fault detection of wind turbines plays an important role in improving the reliability and stable operation of turbines. The supervisory control and data acquisition (SCADA) system of a wind turbine provides promising insights into sensor fault detection due to the accessibility of the data and the abundance of sensor information. However, SCADA data are essentially multivariate time series with inherent spatio-temporal correlation characteristics, which has not been well considered in the existing wind turbine fault detection research. This paper proposes a novel classification-based fault detection method for wind turbine sensors. To better capture the spatio-temporal characteristics hidden in SCADA data, a multiscale spatio-temporal convolutional deep belief network (MSTCDBN) was developed to perform feature learning and classification to fulfill the sensor fault detection. A major superiority of the proposed method is that it can not only learn the spatial correlation information between several different variables but also capture the temporal characteristics of each variable. Furthermore, this method with multiscale learning capability can excavate interactive characteristics between variables at different scales of filters. A generic wind turbine benchmark model was used to evaluate the proposed approach. The comparative results demonstrate that the proposed method can significantly enhance the fault detection performance.

11.
J Biomed Inform ; 91: 103119, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30738946

RESUMEN

OBJECTIVE: Supplementing the Spontaneous Reporting System (SRS) with Electronic Health Record (EHR) data for adverse drug reaction detection could augment sample size, increase population heterogeneity and cross-validate results for pharmacovigilance research. The difference in the underlying data structures and terminologies between SRS and EHR data presents challenges when attempting to integrate the two into a single database. The Observational Health Data Sciences and Informatics (OHDSI) collaboration provides a Common Data Model (CDM) for organizing and standardizing EHR data to support large-scale observational studies. The objective of the study is to develop and evaluate an informatics platform known as ADEpedia-on-OHDSI, where spontaneous reporting data from FDA's Adverse Event Reporting System (FAERS) is converted into the OHDSI CDM format towards building a next generation pharmacovigilance signal detection platform. METHODS: An extraction, transformation and loading (ETL) tool was designed, developed, and implemented to convert FAERS data into the OHDSI CDM format. A comprehensive evaluation, including overall ETL evaluation, mapping quality evaluation of drug names to RxNorm, and an evaluation of transformation and imputation quality, was then performed to assess the mapping accuracy and information loss using the FAERS data collected between 2012 and 2017. Previously published findings related to vascular safety profile of triptans were validated using ADEpedia-on-OHDSI in pharmacovigilance research. For the triptan-related vascular event detection, signals were detected by Reporting Odds Ratio (ROR) in high-level group terms (HLGT) level, high-level terms (HLT) level and preferred term (PT) level using the original FAERS data and CDM-based FAERS respectively. In addition, six standardized MedDRA queries (SMQs) related to vascular events were applied. RESULTS: A total of 4,619,362 adverse event cases were loaded into 8 tables in the OHDSI CDM. For drug name mapping, 93.9% records and 47.0% unique names were matched with RxNorm codes. Mapping accuracy of drug names was 96% based on a manual verification of randomly sampled 500 unique mappings. Information loss evaluation showed that more than 93% of the data is loaded into the OHDSI CDM for most fields, with the exception of drug route data (66%). The replication study detected 5, 18, 47 and 6, 18, 50 triptan-related vascular event signals in MedDRA HLGT level, HLT level, and PT level for the original FAERS data and CDM-based FAERS respectively. The signal detection scores of six standardized MedDRA queries (SMQs) of vascular events in the raw data study were found to be lower than those scores in the CDM study. CONCLUSION: The outcome of this work would facilitate seamless integration and combined analyses of both SRS and EHR data for pharmacovigilance in ADEpedia-on-OHDSI, our platform for next generation pharmacovigilance.


Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Simulación por Computador , Farmacovigilancia , Humanos , Estados Unidos
12.
J Biomed Inform ; 99: 103310, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31622801

RESUMEN

BACKGROUND: Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). METHODS: We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources - Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. RESULTS: Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals' judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. CONCLUSIONS: The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.


Asunto(s)
Registros Electrónicos de Salud/clasificación , Interoperabilidad de la Información en Salud , Obesidad/epidemiología , Alta del Paciente , Adulto , Algoritmos , Índice de Masa Corporal , Comorbilidad , Femenino , Humanos , Aprendizaje Automático , Masculino , Fenotipo , Programas Informáticos
13.
BMC Med Inform Decis Mak ; 19(Suppl 3): 78, 2019 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-30943974

RESUMEN

BACKGROUND: This paper presents a portable phenotyping system that is capable of integrating both rule-based and statistical machine learning based approaches. METHODS: Our system utilizes UMLS to extract clinically relevant features from the unstructured text and then facilitates portability across different institutions and data systems by incorporating OHDSI's OMOP Common Data Model (CDM) to standardize necessary data elements. Our system can also store the key components of rule-based systems (e.g., regular expression matches) in the format of OMOP CDM, thus enabling the reuse, adaptation and extension of many existing rule-based clinical NLP systems. We experimented with our system on the corpus from i2b2's Obesity Challenge as a pilot study. RESULTS: Our system facilitates portable phenotyping of obesity and its 15 comorbidities based on the unstructured patient discharge summaries, while achieving a performance that often ranked among the top 10 of the challenge participants. CONCLUSION: Our system of standardization enables a consistent application of numerous rule-based and machine learning based classification techniques downstream across disparate datasets which may originate across different institutions and data systems.


Asunto(s)
Almacenamiento y Recuperación de la Información , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Registros Electrónicos de Salud , Humanos , Almacenamiento y Recuperación de la Información/métodos , Obesidad , Proyectos Piloto
14.
BMC Med Inform Decis Mak ; 19(Suppl 7): 276, 2019 12 23.
Artículo en Inglés | MEDLINE | ID: mdl-31865899

RESUMEN

BACKGROUND: The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. METHODS: In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. RESULTS: For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. DISCUSSION: Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. CONCLUSIONS: Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.


Asunto(s)
Investigación Biomédica , Elementos de Datos Comunes , Neoplasias , Redes Neurales de la Computación , Algoritmos , Humanos , Proyectos de Investigación , Semántica
15.
BMC Med Inform Decis Mak ; 18(Suppl 5): 116, 2018 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-30526572

RESUMEN

BACKGROUND: Data heterogeneity is a common phenomenon related to the secondary use of electronic health records (EHR) data from different sources. The Observational Health Data Sciences and Informatics (OHDSI) Common Data Model (CDM) organizes healthcare data into standard data structures using concepts that are explicitly and formally specified through standard vocabularies, thereby facilitating large-scale analysis. The objective of this study is to design, develop, and evaluate generic survival analysis routines built using the OHDSI CDM. METHODS: We used intrahepatic cholangiocarcinoma (ICC) patient data to implement CDM-based survival analysis methods. Our methods comprise the following modules: 1) Mapping local terms to standard OHDSI concepts. The analytical expression of variables and values related to demographic characteristics, medical history, smoking status, laboratory results, and tumor feature data. These data were mapped to standard OHDSI concepts through a manual analysis; 2) Loading patient data into the CDM using the concept mappings; 3) Developing an R interface that supports the portable survival analysis on top of OHDSI CDM, and comparing the CDM-based analysis results with those using traditional statistical analysis methods. RESULTS: Our dataset contained 346 patients diagnosed with ICC. The collected clinical data contains 115 variables, of which 75 variables were mapped to the OHDSI concepts. These concepts mainly belong to four domains: condition, observation, measurement, and procedure. The corresponding standard concepts are scattered in six vocabularies: ICD10CM, ICD10PCS, SNOMED, LOINC, NDFRT, and READ. We loaded a total of 25,950 patient data records into the OHDSI CDM database. However, 40 variables failed to map to the OHDSI CDM as they mostly belong to imaging data and pathological data. CONCLUSIONS: Our study demonstrates that conducting survival analysis using the OHDSI CDM is feasible and can produce reusable analysis routines. However, challenges to be overcome include 1) semantic loss caused by inaccurate mapping and value normalization; 2) incomplete OHDSI vocabularies describing imaging data, pathological data, and modular data representation.


Asunto(s)
Macrodatos , Colangiocarcinoma , Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información , Sistemas de Información , Análisis de Supervivencia , Humanos
16.
J Biomed Inform ; 67: 90-100, 2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-28213144

RESUMEN

BACKGROUND: HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging open standard for the exchange of electronic healthcare information. FHIR resources are defined in a specialized modeling language. FHIR instances can currently be represented in either XML or JSON. The FHIR and Semantic Web communities are developing a third FHIR instance representation format in Resource Description Framework (RDF). Shape Expressions (ShEx), a formal RDF data constraint language, is a candidate for describing and validating the FHIR RDF representation. OBJECTIVE: Create a FHIR to ShEx model transformation and assess its ability to describe and validate FHIR RDF data. METHODS: We created the methods and tools that generate the ShEx schemas modeling the FHIR to RDF specification being developed by HL7 ITS/W3C RDF Task Force, and evaluated the applicability of ShEx in the description and validation of FHIR to RDF transformations. RESULTS: The ShEx models contributed significantly to workgroup consensus. Algorithmic transformations from the FHIR model to ShEx schemas and FHIR example data to RDF transformations were incorporated into the FHIR build process. ShEx schemas representing 109 FHIR resources were used to validate 511 FHIR RDF data examples from the Standards for Trial Use (STU 3) Ballot version. We were able to uncover unresolved issues in the FHIR to RDF specification and detect 10 types of errors and root causes in the actual implementation. The FHIR ShEx representations have been included in the official FHIR web pages for the STU 3 Ballot version since September 2016. DISCUSSION: ShEx can be used to define and validate the syntax of a FHIR resource, which is complementary to the use of RDF Schema (RDFS) and Web Ontology Language (OWL) for semantic validation. CONCLUSION: ShEx proved useful for describing a standard model of FHIR RDF data. The combination of a formal model and a succinct format enabled comprehensive review and automated validation.


Asunto(s)
Algoritmos , Internet , Semántica , Registros Electrónicos de Salud , Humanos
17.
Soft Matter ; 12(7): 2177-85, 2016 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-26777462

RESUMEN

Although significant progress has been made in controlling the dispersion of spherical nanoparticles in block copolymer thin films, our ability to disperse and control the assembly of anisotropic nanoparticles into well-defined structures is lacking in comparison. Here we use a combination of experiments and field theoretic simulations to examine the assembly of gold nanorods (AuNRs) in a block copolymer. Experimentally, poly(2-vinylpyridine)-grafted AuNRs (P2VP-AuNRs) are incorporated into poly(styrene)-b-poly(2-vinylpyridine) (PS-b-P2VP) thin films with a vertical cylinder morphology. At sufficiently low concentrations, the AuNRs disperse in the block copolymer thin film. For these dispersed AuNR systems, atomic force microscopy combined with sequential ultraviolet ozone etching indicates that the P2VP-AuNRs segregate to the base of the P2VP cylinders. Furthermore, top-down transmission electron microscopy imaging shows that the P2VP-AuNRs mainly lie parallel to the substrate. Our field theoretic simulations indicate that the NRs are strongly attracted to the cylinder base where they can relieve the local stretching of the minority block of the copolymer. These simulations also indicate conditions that will drive AuNRs to adopt a vertical orientation, namely by increasing nanorod length and/or reducing the wetting of the short block towards the substrate.

18.
J Biomed Inform ; 63: 295-306, 2016 10.
Artículo en Inglés | MEDLINE | ID: mdl-27597572

RESUMEN

In this paper, we present an automated method for taxonomy learning, focusing on concept formation and hierarchical relation learning. To infer such relations, we partition the extracted concepts and group them into closely-related clusters using Hierarchical Agglomerative Clustering, informed by syntactic matching and semantic relatedness functions. We introduce a novel, unsupervised method for cluster detection based on automated dendrogram pruning, which is dynamic to each partition. We evaluate our approach with two different types of textual corpora, clinical trials descriptions and MEDLINE publication abstracts. The results of several experiments indicate that our method is superior to existing dynamic pruning and the state-of-art taxonomy learning methods. It yields higher concept coverage (95.75%) and higher accuracy of learned taxonomic relations (up to 0.71 average precision and 0.96 average recall).


Asunto(s)
Análisis por Conglomerados , MEDLINE , Semántica , Aprendizaje Automático no Supervisado , Procesamiento Automatizado de Datos , Humanos , Conocimiento
19.
J Biomed Inform ; 63: 11-21, 2016 10.
Artículo en Inglés | MEDLINE | ID: mdl-27444185

RESUMEN

BACKGROUND: Constructing standard and computable clinical diagnostic criteria is an important but challenging research field in the clinical informatics community. The Quality Data Model (QDM) is emerging as a promising information model for standardizing clinical diagnostic criteria. OBJECTIVE: To develop and evaluate automated methods for converting textual clinical diagnostic criteria in a structured format using QDM. METHODS: We used a clinical Natural Language Processing (NLP) tool known as cTAKES to detect sentences and annotate events in diagnostic criteria. We developed a rule-based approach for assigning the QDM datatype(s) to an individual criterion, whereas we invoked a machine learning algorithm based on the Conditional Random Fields (CRFs) for annotating attributes belonging to each particular QDM datatype. We manually developed an annotated corpus as the gold standard and used standard measures (precision, recall and f-measure) for the performance evaluation. RESULTS: We harvested 267 individual criteria with the datatypes of Symptom and Laboratory Test from 63 textual diagnostic criteria. We manually annotated attributes and values in 142 individual Laboratory Test criteria. The average performance of our rule-based approach was 0.84 of precision, 0.86 of recall, and 0.85 of f-measure; the performance of CRFs-based classification was 0.95 of precision, 0.88 of recall and 0.91 of f-measure. We also implemented a web-based tool that automatically translates textual Laboratory Test criteria into the QDM XML template format. The results indicated that our approaches leveraging cTAKES and CRFs are effective in facilitating diagnostic criteria annotation and classification. CONCLUSION: Our NLP-based computational framework is a feasible and useful solution in developing diagnostic criteria representation and computerization.


Asunto(s)
Algoritmos , Exactitud de los Datos , Diagnóstico por Computador , Procesamiento de Lenguaje Natural , Humanos , Aprendizaje Automático
20.
J Biomed Inform ; 62: 232-42, 2016 08.
Artículo en Inglés | MEDLINE | ID: mdl-27392645

RESUMEN

The Quality Data Model (QDM) is an information model developed by the National Quality Forum for representing electronic health record (EHR)-based electronic clinical quality measures (eCQMs). In conjunction with the HL7 Health Quality Measures Format (HQMF), QDM contains core elements that make it a promising model for representing EHR-driven phenotype algorithms for clinical research. However, the current QDM specification is available only as descriptive documents suitable for human readability and interpretation, but not for machine consumption. The objective of the present study is to develop and evaluate a data element repository (DER) for providing machine-readable QDM data element service APIs to support phenotype algorithm authoring and execution. We used the ISO/IEC 11179 metadata standard to capture the structure for each data element, and leverage Semantic Web technologies to facilitate semantic representation of these metadata. We observed there are a number of underspecified areas in the QDM, including the lack of model constraints and pre-defined value sets. We propose a harmonization with the models developed in HL7 Fast Healthcare Interoperability Resources (FHIR) and Clinical Information Modeling Initiatives (CIMI) to enhance the QDM specification and enable the extensibility and better coverage of the DER. We also compared the DER with the existing QDM implementation utilized within the Measure Authoring Tool (MAT) to demonstrate the scalability and extensibility of our DER-based approach.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Fenotipo , Investigación Biomédica , Bases de Datos Factuales , Humanos , Semántica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA