RESUMO
The accuracy of life cycle assessment (LCA) studies is often questioned due to the two grand challenges of life cycle inventory (LCI) modeling: (1) missing foreground flow data and (2) inconsistency in background data matching. Traditional mechanistic methods (e.g., process simulation) and existing machine learning (ML) methods (e.g., similarity-based selection methods) are inadequate due to their limitations in scalability and generalizability. The large language models (LLMs) are well-positioned to address these challenges, given the massive and diverse knowledge learned through the pretraining step. Incorporating LLMs into LCI modeling can lead to the automation of inventory data curation from diverse data sources and to the implementation of a multimodal analytical capacity. In this article, we delineated the mechanisms and advantages of LLMs to addressing these two grand challenges. We also discussed the future research to enhance the use of LLMs for LCI modeling, which includes the key areas such as improving retrieval augmented generation (RAG), integration with knowledge graphs, developing prompt engineering strategies, and fine-tuning pretrained LLMs for LCI-specific tasks. The findings from our study serve as a foundation for future research on scalable and automated LCI modeling methods that can provide more appropriate data for LCA calculations.
RESUMO
BACKGROUND: The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) can be used to transform observational health data to a common format. CDM transformation allows for analysis across disparate databases for the generation of new, real-word evidence, which is especially important in rare disease where data are limited. Pulmonary hypertension (PH) is a progressive, life-threatening disease, with rare subgroups such as pulmonary arterial hypertension (PAH), for which generating real-world evidence is challenging. Our objective is to document the process and outcomes of transforming registry data in PH to the OMOP CDM, and highlight challenges and our potential solutions. METHODS: Three observational studies were transformed from the Clinical Data Interchange Standards Consortium study data tabulation model (SDTM) to OMOP CDM format. OPUS was a prospective, multi-centre registry (2014-2020) and OrPHeUS was a retrospective, multi-centre chart review (2013-2017); both enrolled patients newly treated with macitentan in the US. EXPOSURE is a prospective, multi-centre cohort study (2017-ongoing) of patients newly treated with selexipag or any PAH-specific therapy in Europe and Canada. OMOP CDM version 5.3.1 with recent OMOP CDM vocabulary was used. Imputation rules were defined and applied for missing dates to avoid exclusion of data. Custom target concepts were introduced when existing concepts did not provide sufficient granularity. RESULTS: Of the 6622 patients in the three registry studies, records were mapped for 6457. Custom target concepts were introduced for PAH subgroups (by combining SNOMED concepts or creating custom concepts) and World Health Organization functional class. Per the OMOP CDM convention, records about the absence of an event, or the lack of information, were not mapped. Excluding these non-event records, 4% (OPUS), 2% (OrPHeUS) and 1% (EXPOSURE) of records were not mapped. CONCLUSIONS: SDTM data from three registries were transformed to the OMOP CDM with limited exclusion of data and deviation from the SDTM database content. Future researchers can apply our strategy and methods in different disease areas, with tailoring as necessary. Mapping registry data to the OMOP CDM facilitates more efficient collaborations between researchers and establishment of federated data networks, which is an unmet need in rare diseases.
Assuntos
Hipertensão Pulmonar , Estudos de Coortes , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Humanos , Hipertensão Pulmonar/epidemiologia , Estudos Prospectivos , Sistema de Registros , Estudos RetrospectivosRESUMO
BACKGROUND: The Experimental Factor Ontology (EFO) is an application ontology driven by experimental variables including cell lines to organize and describe the diverse experimental variables and data resided in the EMBL-EBI resources. The Cell Line Ontology (CLO) is an OBO community-based ontology that contains information of immortalized cell lines and relevant experimental components. EFO integrates and extends ontologies from the bio-ontology community to drive a number of practical applications. It is desirable that the community shares design patterns and therefore that EFO reuses the cell line representation from the Cell Line Ontology (CLO). There are, however, challenges to be addressed when developing a common ontology design pattern for representing cell lines in both EFO and CLO. RESULTS: In this study, we developed a strategy to compare and map cell line terms between EFO and CLO. We examined Cellosaurus resources for EFO-CLO cross-references. Text labels of cell lines from both ontologies were verified by biological information axiomatized in each source. The study resulted in the identification 873 EFO-CLO aligned and 344 EFO unique immortalized permanent cell lines. All of these cell lines were updated to CLO and the cell line related information was merged. A design pattern that integrates EFO and CLO was also developed. CONCLUSION: Our study compared, aligned, and synchronized the cell line information between CLO and EFO. The final updated CLO will be examined as the candidate ontology to import and replace eligible EFO cell line classes thereby supporting the interoperability in the bio-ontology domain. Our mapping pipeline illustrates the use of ontology in aiding biological data standardization and integration through the biological and semantics content of cell lines.
Assuntos
Algoritmos , Ontologias Biológicas , Fenômenos Fisiológicos Celulares , Biologia Computacional/métodos , Bases de Dados Factuais , Perfilação da Expressão Gênica , Linhagem Celular , Mineração de Dados , Humanos , SemânticaRESUMO
From the beginning of the fourth national census of traditional Chinese medicine resources in 2011, a large amount of data have been collected and compiled, including wild medicinal plant resource data, cultivation of medicinal plant information, traditional knowledge, and specimen information. The traditional paper-based recording method is inconvenient for query and application. The B/S architecture, JavaWeb framework and SOA are used to design and develop the fourth national census results display platform. Through the data integration and sorting, the users are to provide with integrated data services and data query display solutions. The platform realizes the fine data classification, and has the simple data retrieval and the university statistical analysis function. The platform uses Echarts components, Geo Server, Open Layers and other technologies to provide a variety of data display forms such as charts, maps and other visualization forms, intuitive reflects the number, distribution and type of Chinese material medica resources. It meets the data mapping requirements of different levels of users, and provides support for management decision-making.
Assuntos
Sistemas de Gerenciamento de Base de Dados , Medicamentos de Ervas Chinesas , Materia Medica , Medicina Tradicional Chinesa , Plantas Medicinais , China , Inquéritos e QuestionáriosRESUMO
According to the regulation "Decreto del Presidente del Consiglio dei Ministri" (DPCM) of September 29, 2015, n.178, the Logical Observation Identifiers Names and Codes (LOINC) system is included among the coding systems adopted in the Italian Electronic Health Record (EHR). As part of the Digital Health Solutions in Community Medicine (DHEAL-COM) project, one key goal is to categorize parameters using international classification systems. This enables the identification of appropriate Information and Communication Technology (ICT) solutions tailored to support people's health needs. Our objective is to incorporate LOINC codes for parameter categorization, thus anticipating the future use of EHR.
Assuntos
Registros Eletrônicos de Saúde , Logical Observation Identifiers Names and Codes , Itália , Integração de Sistemas , Humanos , Registro Médico CoordenadoRESUMO
The integration of data from various healthcare centers into disease registries is pivotal for facilitating collaborative research and enhancing clinical insights. In this study, we investigate the integration process of existing registries into the PVRI GoDeep meta-registry, focusing on the complexities and challenges encountered. We detail the integration process, including data transformation, mapping updates, and feedback mechanisms. Our findings underscore the importance of standardized processes and proactive communication in addressing data quality issues, ultimately enhancing the reliability and trustworthiness of meta-registry data. Through careful harmonization of the data and transparent documentation of data processing, we pave the way for leveraging registry data to drive advancements in pulmonary hypertension research and patient care.
Assuntos
Hipertensão Pulmonar , Sistema de Registros , HumanosRESUMO
We analyze leading journals in behavioral finance to identify the most-used keywords in the area and how they have evolved. Using keyword analysis of data between 2000 and 2020 as well as data mapping and visualization tools, a dynamic map of the discipline was constructed. This study assesses the state-of-the-art of the field, main topics of discussion, relationships that arise between the concepts discussed, and emerging issues of interest. The sample comprises 3876 pieces, including 15859 keywords from journals responsible for the growth of the discipline, namely the Journal of Behavioral and Experimental Economics, Journal of Behavioral and Experimental Finance, Journal of Economic Psychology, Journal of Behavioral Finance, and Review of Behavioral Finance. During the period analyzed, our results depict a lively area and highlight the prominent role that experiments play in the field. Two related but different streams of behavioral finance research are revealed.
RESUMO
This paper explores the challenges and lessons learned during the mapping of HL7 v2 messages structured using custom schema to openEHR for the Medical Data Integration Center (MeDIC) of the University Hospital, Schleswig-Holstein (UKSH). Missing timestamps in observations, missing units of measurement, inconsistencies in decimal separators and unexpected datatypes were identified as critical inconsistencies in this process. These anomalies highlight the difficulty of automating the transformation of HL7 v2 data to any standard, particularly openEHR, using off-the-shelf tools. Addressing these anomalies is crucial for enhancing data interoperability, supporting evidence-based research, and optimizing clinical decision-making. Implementing proper data quality measures and governance will unlock the potential of integrated clinical data, empowering clinicians and researchers and fostering a robust healthcare ecosystem.
Assuntos
Nível Sete de Saúde , Registros Eletrônicos de Saúde , Interoperabilidade da Informação em Saúde , Alemanha , Integração de Sistemas , Humanos , Registro Médico Coordenado/métodosRESUMO
Electron microscopy is a valuable tool for elucidating the three-dimensional structures of macromolecular complexes. As the field matures and the number of solved structures increases, the existence of infrastructures that keep this information organized and accessible is crucial. At the same time, standards and clearly described conventions facilitate software maintenance, benefit interoperability with other packages and allow data interchange. This work describes three developments promoting integrative biology, standardization and workflow processing, namely PeppeR, the EMX initiative and Scipion.
Assuntos
Disseminação de Informação , Microscopia Eletrônica/métodos , Software , Algoritmos , Processamento de Imagem Assistida por Computador/métodosRESUMO
The COVID-19 pandemic and the digitalization of medical services present significant challenges for the medical sector of the European Union, with profound implications for health systems and the provision of high-performance public health services. The sustainability and resilience of health systems are based on the introduction of information and communication technology in health processes and services, eliminating the vulnerability that can have significant consequences for health, social cohesion, and economic progress. This research aims to assess the impact of digitalization on several dimensions of health, introducing specific implications of the COVID-19 pandemic. The research methodology consists of three procedures: cluster analysis performed through vector quantization, agglomerative clustering, and an analytical approach consisting of data mapping. The main results highlight the importance of effective national responses and provide recommendations, various priorities, and objectives to strengthen health systems at the European level. Finally, the results reveal the need to reduce the gaps between the EU member states and a new approach to policy, governance, investment, health spending, and the performing provision of digital services.
Assuntos
COVID-19 , COVID-19/epidemiologia , União Europeia , Programas Governamentais , Humanos , Assistência Médica , PandemiasRESUMO
RNA-sequencing could be nowadays considered the gold standard to study the coding and noncoding transcriptome. The great advantage of high-throughput sequencing in the characterization and quantification of long noncoding RNA (lncRNA) resides in its capability to capture the complexity of lncRNA transcripts configuration patterns, even in the presence of several alternative isoforms, with superior accuracy and discovery power compared to other technologies such as microarrays or PCR-based methods. In this chapter, we provide a protocol for lncRNA analysis using through high-throughput sequencing, indicating the main difficulties in the annotation pipeline and showing how an accurate evaluation of the procedure can help to minimize biased observations.
Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , RNA Longo não Codificante/genética , Análise de Sequência de RNA , Transcriptoma , Algoritmos , Interpretação Estatística de Dados , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular , Análise de Sequência de RNA/métodos , Software , Interface Usuário-Computador , NavegadorRESUMO
Integration of heterogeneous data sources in a single representation is an active field with many different tools and techniques. In the case of text-based approaches-those that base the definition of the mappings and the integration on a DSL-there is a lack of usability studies. In this work we have conducted a usability experiment (n = 17) on three different languages: ShExML (our own language), YARRRML and SPARQL-Generate. Results show that ShExML users tend to perform better than those of YARRRML and SPARQL-Generate. This study sheds light on usability aspects of these languages design and remarks some aspects of improvement.
RESUMO
Nursing Minimum Data Sets (NMDS) intend to systematically describe nursing care. Until now NMDS have been populated with nursing data by manual data ascertainment which is inefficient. The objective of this work was to evaluate an automated mapping pipeline for transforming nursing data into an NMDS. We used LEP Nursing 3 data as source data and the Austrian and German NMDS as target formats. Based on a human expert mapping between LEP and NMDS, an automated data mapping algorithm was developed and implemented in an automatic mapping pipeline. The results show that most LEP nursing interventions can be matched to the NMDS-AT and G-NMDS and that a fully automated mapping process from LEP Nursing 3 data to NMDS-AT performs effectively and very efficiently. The shown approach can also be used to map different nursing classifications and to automatically transform point-of-care nursing data into nursing minimum data sets.
Assuntos
Bases de Dados Factuais , Pesquisa em Enfermagem , Áustria , Humanos , Registros de EnfermagemRESUMO
BACKGROUND: The new European legislation on data protection, namely, the General Data Protection Regulation (GDPR), has introduced comprehensive requirements for the documentation about the processing of personal data as well as informing the data subjects of its use. GDPR's accountability principle requires institutions, projects, and data hubs to document their data processings and demonstrate compliance with the GDPR. In response to this requirement, we see the emergence of commercial data-mapping tools, and institutions creating GDPR data register with such tools. One shortcoming of this approach is the genericity of tools, and their process-based model not capturing the project-based, collaborative nature of data processing in biomedical research. FINDINGS: We have developed a software tool to allow research institutions to comply with the GDPR accountability requirement and map the sometimes very complex data flows in biomedical research. By analysing the transparency and record-keeping obligations of each GDPR principle, we observe that our tool effectively meets the accountability requirement. CONCLUSIONS: The GDPR is bringing data protection to center stage in research data management, necessitating dedicated tools, personnel, and processes. Our tool, DAISY, is tailored specifically for biomedical research and can help institutions in tackling the documentation challenge brought about by the GDPR. DAISY is made available as a free and open source tool on Github. DAISY is actively being used at the Luxembourg Centre for Systems Biomedicine and the ELIXIR-Luxembourg data hub.
Assuntos
Segurança Computacional/legislação & jurisprudência , Registros Eletrônicos de Saúde , Europa (Continente) , Humanos , Responsabilidade SocialRESUMO
The use of mass spectrometry-based metabolomics to study human, plant and microbial biochemistry and their interactions with the environment largely depends on the ability to annotate metabolite structures by matching mass spectral features of the measured metabolites to curated spectra of reference standards. While reference databases for metabolomics now provide information for hundreds of thousands of compounds, barely 5% of these known small molecules have experimental data from pure standards. Remarkably, it is still unknown how well existing mass spectral libraries cover the biochemical landscape of prokaryotic and eukaryotic organisms. To address this issue, we have investigated the coverage of 38 genome-scale metabolic networks by public and commercial mass spectral databases, and found that on average only 40% of nodes in metabolic networks could be mapped by mass spectral information from standards. Next, we deciphered computationally which parts of the human metabolic network are poorly covered by mass spectral libraries, revealing gaps in the eicosanoids, vitamins and bile acid metabolism. Finally, our network topology analysis based on the betweenness centrality of metabolites revealed the top 20 most important metabolites that, if added to MS databases, may facilitate human metabolome characterization in the future.
RESUMO
Background & Objectives: Legacy laboratory test codes make it difficult to use clinical datasets for meaningful translational research, where populations are followed for disease risk and outcomes over many years. The Health Informatics Centre (HIC) at the University of Dundee hosts continuous biochemistry data from the clinical laboratories in Tayside and Fife dating back as far as 1987. However, the HIC-managed biochemistry dataset is coupled with incoherent sample types and unstandardised legacy local test codes, which increases the complexity of using the dataset for reasonable population health outcomes. The objective of this study was to map the legacy local test codes to the Scottish 5-byte Version 2 Read Codes using biochemistry data extracted from the repository of the Scottish Care Information (SCI) Store. METHODS: Data mapping methodology was used to map legacy local test codes from clinical biochemistry laboratories within Tayside and Fife to the Scottish 5-byte Version 2 Read Codes. RESULTS: The methodology resulted in the mapping of 485 legacy laboratory test codes, spanning 25 years, to 124 Read Codes. CONCLUSION: The data mapping methodology not only facilitated the restructuring of the HIC-managed biochemistry dataset to support easier cohort identification and selection, but it also made it easier for the standardised local laboratory test codes, in the Scottish 5-byte Version 2 Read Codes, to be mapped to other health data standards such as Clinical Terms Version 3 (CTV3); LOINC; and SNOMED CT.
Assuntos
Sistemas de Informação em Laboratório Clínico , Integração de Sistemas , Confiabilidade dos Dados , Curadoria de Dados , Humanos , EscóciaRESUMO
OBJECTIVE: The Appraisal of Guidelines for Research and Evaluation (AGREE) is a representative, quantitative evaluation tool for evidence-based clinical practice guidelines (CPGs). Recently, AGREE was revised (AGREE II). The continuity of evaluation data obtained from the original version (AGREE I) has not yet been demonstrated. The present study investigated the relationship between data obtained from AGREE I and AGREE II to evaluate the continuity between the two measurement tools. RESULTS: An evaluation team consisting of three trained librarians evaluated 68 CPGs issued in 2011-2012 in Japan using AGREE I and AGREE II. The correlation coefficients for the six domains were: (1) scope and purpose 0.758; (2) stakeholder involvement 0.708; (3) rigor of development 0.982; (4) clarity of presentation 0.702; (5) applicability 0.919; and (6) editorial independence 0.971. The item "Overall Guideline Assessment" was newly introduced in AGREE II. This global item had a correlation coefficient of 0.628 using the six AGREE I domains, and 0.685 using the 23 items. Our results suggest that data obtained from AGREE I can be transferred to AGREE II, and the "Overall Guideline Assessment" data can be determined with high reliability using a standardized score of the 23 items.
Assuntos
Estudos de Avaliação como Assunto , Medicina Baseada em Evidências , Guias de Prática Clínica como Assunto , JapãoRESUMO
Most patients with chronic disease are prescribed multiple medications, which are recorded in their personal health records. This is rich information for clinical public health researchers but also a challenge to analyse. This paper describes the method that was undertaken within the Public Health Research Data Management System (PHReDMS) to map medication data retrieved from individual patient health records for population health researcher's use. The PHReDMS manages clinical, health service, community and survey research data within a secure web environment that allows for data sharing amongst researchers. The PHReDMS is currently used by researchers to answer a broad range of questions, including monitoring of prescription patterns in different population groups and geographic areas with high incidence/prevalence of chronic renal, cardiovascular, metabolic and mental health issues. In this paper, we present the general notion of abstraction network, a higher level network that sits above a terminology and offers compact and more easily understandable view of its content. We demonstrate the utilisation of abstraction network methodology to examine medication data from electronic medical records to allow a compact and more easily understandable view of its content.
RESUMO
Electrocardiography is one of the most important non-invasive diagnostic tools for diagnosing coronary heart disease. The electrocardiography information system in Maharaj Nakorn Chiang Mai Hospital required a massive manual labor effort. In this article, we propose an approach toward the integration of heterogeneous electrocardiography data and the implementation of an integrated electrocardiography information system into the existing Hospital Information System. The system integrates different electrocardiography formats into a consistent electrocardiography rendering by using Java software. The interface acts as middleware to seamlessly integrate different electrocardiography formats. Instead of using a common electrocardiography protocol, we applied a central format based on Java classes for mapping different electrocardiography formats which contains a specific parser for each electrocardiography format to acquire the same information. Our observations showed that the new system improved the effectiveness of data management, work flow, and data quality; increased the availability of information; and finally improved quality of care.
Assuntos
Eletrocardiografia/métodos , Eletrocardiografia/estatística & dados numéricos , Sistemas de Informação Hospitalar/estatística & dados numéricos , Padrões de Referência , Registros Eletrônicos de Saúde/tendências , Humanos , Informática/métodos , TailândiaRESUMO
This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN. GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal.