Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37478371

RESUMO

Artificial intelligence (AI) systems utilizing deep neural networks and machine learning (ML) algorithms are widely used for solving critical problems in bioinformatics, biomedical informatics and precision medicine. However, complex ML models that are often perceived as opaque and black-box methods make it difficult to understand the reasoning behind their decisions. This lack of transparency can be a challenge for both end-users and decision-makers, as well as AI developers. In sensitive areas such as healthcare, explainability and accountability are not only desirable properties but also legally required for AI systems that can have a significant impact on human lives. Fairness is another growing concern, as algorithmic decisions should not show bias or discrimination towards certain groups or individuals based on sensitive attributes. Explainable AI (XAI) aims to overcome the opaqueness of black-box models and to provide transparency in how AI systems make decisions. Interpretable ML models can explain how they make predictions and identify factors that influence their outcomes. However, the majority of the state-of-the-art interpretable ML methods are domain-agnostic and have evolved from fields such as computer vision, automated reasoning or statistics, making direct application to bioinformatics problems challenging without customization and domain adaptation. In this paper, we discuss the importance of explainability and algorithmic transparency in the context of bioinformatics. We provide an overview of model-specific and model-agnostic interpretable ML methods and tools and outline their potential limitations. We discuss how existing interpretable ML methods can be customized and fit to bioinformatics research problems. Further, through case studies in bioimaging, cancer genomics and text mining, we demonstrate how XAI methods can improve transparency and decision fairness. Our review aims at providing valuable insights and serving as a starting point for researchers wanting to enhance explainability and decision transparency while solving bioinformatics problems. GitHub: https://github.com/rezacsedu/XAI-for-bioinformatics.


Assuntos
Inteligência Artificial , Biologia Computacional , Humanos , Aprendizado de Máquina , Algoritmos , Genômica
2.
Artigo em Alemão | MEDLINE | ID: mdl-38750239

RESUMO

Health data are extremely important in today's data-driven world. Through automation, healthcare processes can be optimized, and clinical decisions can be supported. For any reuse of data, the quality, validity, and trustworthiness of data are essential, and it is the only way to guarantee that data can be reused sensibly. Specific requirements for the description and coding of reusable data are defined in the FAIR guiding principles for data stewardship. Various national research associations and infrastructure projects in the German healthcare sector have already clearly positioned themselves on the FAIR principles: both the infrastructures of the Medical Informatics Initiative and the University Medicine Network operate explicitly on the basis of the FAIR principles, as do the National Research Data Infrastructure for Personal Health Data and the German Center for Diabetes Research.To ensure that a resource complies with the FAIR principles, the degree of FAIRness should first be determined (so-called FAIR assessment), followed by the prioritization for improvement steps (so-called FAIRification). Since 2016, a set of tools and guidelines have been developed for both steps, based on the different, domain-specific interpretations of the FAIR principles.Neighboring European countries have also invested in the development of a national framework for semantic interoperability in the context of the FAIR (Findable, Accessible, Interoperable, Reusable) principles. Concepts for comprehensive data enrichment were developed to simplify data analysis, for example, in the European Health Data Space or via the Observational Health Data Sciences and Informatics network. With the support of the European Open Science Cloud, among others, structured FAIRification measures have already been taken for German health datasets.


Assuntos
Registros Eletrônicos de Saúde , Humanos , Alemanha , Internacionalidade , Programas Nacionais de Saúde
3.
Brief Bioinform ; 22(1): 393-415, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-32008043

RESUMO

Clustering is central to many data-driven bioinformatics research and serves a powerful computational method. In particular, clustering helps at analyzing unstructured and high-dimensional data in the form of sequences, expressions, texts and images. Further, clustering is used to gain insights into biological processes in the genomics level, e.g. clustering of gene expressions provides insights on the natural structure inherent in the data, understanding gene functions, cellular processes, subtypes of cells and understanding gene regulations. Subsequently, clustering approaches, including hierarchical, centroid-based, distribution-based, density-based and self-organizing maps, have long been studied and used in classical machine learning settings. In contrast, deep learning (DL)-based representation and feature learning for clustering have not been reviewed and employed extensively. Since the quality of clustering is not only dependent on the distribution of data points but also on the learned representation, deep neural networks can be effective means to transform mappings from a high-dimensional data space into a lower-dimensional feature space, leading to improved clustering results. In this paper, we review state-of-the-art DL-based approaches for cluster analysis that are based on representation learning, which we hope to be useful, particularly for bioinformatics research. Further, we explore in detail the training procedures of DL-based clustering algorithms, point out different clustering quality metrics and evaluate several DL-based approaches on three bioinformatics use cases, including bioimaging, cancer genomics and biomedical text mining. We believe this review and the evaluation results will provide valuable insights and serve a starting point for researchers wanting to apply DL-based unsupervised methods to solve emerging bioinformatics research problems.


Assuntos
Biologia Computacional/métodos , Aprendizado Profundo , Análise por Conglomerados
4.
Eur Radiol ; 30(10): 5510-5524, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32377810

RESUMO

Digitization of medicine requires systematic handling of the increasing amount of health data to improve medical diagnosis. In this context, the integration of the versatile diagnostic information, e.g., from anamnesis, imaging, histopathology, and clinical chemistry, and its comprehensive analysis by artificial intelligence (AI)-based tools is expected to improve diagnostic precision and the therapeutic conduct. However, the complex medical environment poses a major obstacle to the translation of integrated diagnostics into clinical research and routine. There is a high need to address aspects like data privacy, data integration, interoperability standards, appropriate IT infrastructure, and education of staff. Besides this, a plethora of technical, political, and ethical challenges exists. This is complicated by the high diversity of approaches across Europe. Thus, we here provide insights into current international activities on the way to digital comprehensive diagnostics. This includes a technical view on challenges and solutions for comprehensive diagnostics in terms of data integration and analysis. Current data communications standards and common IT solutions that are in place in hospitals are reported. Furthermore, the international hospital digitalization scoring and the European funding situation were analyzed. In addition, the regional activities in radiomics and the related publication trends are discussed. Our findings show that prerequisites for comprehensive diagnostics have not yet been sufficiently established throughout Europe. The manifold activities are characterized by a heterogeneous digitization progress and they are driven by national efforts. This emphasizes the importance of clear governance, concerted investments, and cooperation at various levels in the health systems.Key Points• Europe is characterized by heterogeneity in its digitization progress with predominantly national efforts. Infrastructural prerequisites for comprehensive diagnostics are not given and not sufficiently funded throughout Europe, which is particularly true for data integration.• The clinical establishment of comprehensive diagnostics demands for a clear governance, significant investments, and cooperation at various levels in the healthcare systems.• While comprehensive diagnostics is on its way, concerted efforts should be taken in Europe to get consensus concerning interoperability and standards, security, and privacy as well as ethical and legal concerns.


Assuntos
Inteligência Artificial/tendências , Informática Médica/tendências , Radiologia/tendências , Telemedicina/tendências , Sistemas Computacionais , Mineração de Dados , Europa (Continente) , Humanos , Pesquisa Interdisciplinar , Internacionalidade , Privacidade , Editoração/tendências , Software
5.
Stud Health Technol Inform ; 317: 49-58, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39234706

RESUMO

INTRODUCTION: Data-driven medical research (DDMR) needs multimodal data (MMD) to sufficiently capture the complexity of clinical cases. Methods for early multimodal data integration (MMDI), i.e. integration of the data before performing a data analysis, vary from basic concatenation to applying Deep Learning, each with distinct characteristics and challenges. Besides early MMDI, there exists late MMDI which performs modality-specific data analyses and then combines the analysis results. METHODS: We conducted a scoping review, following PRISMA guidelines, to find and analyze 21 reviews on methods for early MMDI between 2019 and 2024. RESULTS: Our analysis categorized these methods into four groups and summarized group-specific characteristics that are relevant for choosing the optimal method combination for MMDI pipelines in DDMR projects. Moreover, we found that early MMDI is often performed by executing several methods subsequently in a pipeline. This early MMDI pipeline is usually subject to manual optimization. DISCUSSION: Our focus was on structural integration in DDMR. The choice of MMDI method depends on the research setting, complexity, and the researcher team's expertise. Future research could focus on comparing early and late MMDI approaches as well as automating the optimization of MMDI pipelines to integrate vast amounts of real-world medical data effectively, facilitating holistic DDMR.


Assuntos
Pesquisa Biomédica , Humanos
6.
Stud Health Technol Inform ; 316: 9-13, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176661

RESUMO

Data quality deficiencies significantly limit the applicability of real-world data in data-driven medical research. In this study, using an oncological use case, we report and discuss common quality deficiencies in real-world medical datasets, such as missing data, class imbalances, and timeliness issues. We compiled a multi-departmental real-world dataset comprising 13861 cancer cases diagnosed at University Hospital Cologne and examined data quality throughout the data integration process.


Assuntos
Confiabilidade dos Dados , Neoplasias , Humanos , Neoplasias/terapia , Oncologia , Alemanha , Registros Eletrônicos de Saúde
7.
Stud Health Technol Inform ; 316: 301-302, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176732

RESUMO

The importance of cybersecurity in healthcare, with a focus on safeguarding sensitive patient information from unauthorized access, use, or disclosure, cannot be overstated Security breaches in this sector can have significant consequences due to the widespread use of electronic health records (EHRs) and interconnected medical devices, creating opportunities for exploitation. This work presents a first step to analyzing and organizing healthcare-specific cybersecurity problems and existing security frameworks. Special focus is put on the security risks associated with data integration centers while recognizing their role as hubs for innovation.


Assuntos
Segurança Computacional , Registros Eletrônicos de Saúde , Confidencialidade
8.
Stud Health Technol Inform ; 316: 48-52, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176670

RESUMO

This paper presents an implementation of an architecture based on open-source solutions using ELK Stack - Elasticsearch, Logstash, and Kibana - for real-time data analysis and visualizations in the Medical Data Integration Center, University Hospital Cologne, Germany. The architecture addresses challenges in handling diverse data sources, ensuring standardized access, and facilitating seamless analysis in real-time, ultimately enhancing the precision, speed, and quality of monitoring processes within the medical informatics domain.


Assuntos
Hospitais Universitários , Alemanha , Integração de Sistemas , Registros Eletrônicos de Saúde , Sistemas Computacionais , Software
9.
Stud Health Technol Inform ; 316: 726-730, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176898

RESUMO

The paper discusses biases in medical imaging analysis, particularly focusing on the challenges posed by the development of machine learning algorithms and generative models. It introduces a taxonomy of bias problems and addresses them through a data infrastructure initiative: the PADME (Platform for Analytics and Distributed Machine-Learning for Enterprises), which is a part of the National Research Data Infrastructure for Personal Health Data (NFDI4Health) project. The PADME facilitates the structuring and sharing of health data while ensuring privacy and adherence to FAIR principles. The paper presents experimental results that show that generative methods can be effective in data augmentation. Complying with PADME infrastructure, this work proposes a solution framework to deal with bias in the different data stations and preserve privacy when transferring images. It highlights the importance of standardized data infrastructure in mitigating biases and promoting FAIR, reusable, and privacy-preserving research environments in healthcare.


Assuntos
Diagnóstico por Imagem , Aprendizado de Máquina , Humanos , Viés , Algoritmos , Confidencialidade , Segurança Computacional
10.
Stud Health Technol Inform ; 316: 1396-1400, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176641

RESUMO

This paper explores key success factors for the development and implementation of a Common Data Model (CDM) for Rare Diseases (RDs) focusing on the European context. Several challenges hinder RD care and research in diagnosis, treatment, and research, including data fragmentation, lack of standardisation, and Interoperability (IOP) issues within healthcare information systems. We identify key issues and recommendations for an RD-CDM, drawing on international guidelines and existing infrastructure, to address organisational, consensus, interoperability, usage, and secondary use challenges. Based on these, we analyse the importance of balancing the scope and IOP of a CDM to cater to the unique requirements of RDs while ensuring effective data exchange and usage across systems. In conclusion, a well-designed RD-CDM can bridge gaps in RD care and research, enhance patient care and facilitate international collaborations.


Assuntos
Elementos de Dados Comuns , Doenças Raras , Humanos , Registros Eletrônicos de Saúde , Europa (Continente) , Interoperabilidade da Informação em Saúde , Doenças Raras/terapia
11.
Stud Health Technol Inform ; 316: 358-359, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176750

RESUMO

This work aims to improve FAIR-ness of the microneurography research by integrating the local (meta)data to existing research data infrastructures. In the previous work, we developed an odML based solution for local metadata storage of microneurography data. However, this solution is limited to a narrow community. As a next step, we propose the integration into the Local Data Hubs, data-sharing services within NFDI4Health infrastructure. We outline a first concept, that streams chosen data from the established odMLtables GUI.


Assuntos
Metadados , Humanos , Armazenamento e Recuperação da Informação/métodos , Disseminação de Informação
12.
Stud Health Technol Inform ; 317: 40-48, 2024 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-39234705

RESUMO

INTRODUCTION: The Local Data Hub (LDH) is a platform for FAIR sharing of medical research (meta-)data. In order to promote the usage of LDH in different research communities, it is important to understand the domain-specific needs, solutions currently used for data organization and provide support for seamless uploads to a LDH. In this work, we analyze the use case of microneurography, which is an electrophysiological technique for analyzing neural activity. METHODS: After performing a requirements analysis in dialogue with microneurography researchers, we propose a concept-mapping and a workflow, for the researchers to transform and upload their metadata. Further, we implemented a semi-automatic upload extension to odMLtables, a template-based tool for handling metadata in the electrophysiological community. RESULTS: The open-source implementation enables the odML-to-LDH concept mapping, allows data anonymization from within the tool and the creation of custom-made summaries on the underlying data sets. DISCUSSION: This concludes a first step towards integrating improved FAIR processes into the research laboratory's daily workflow. In future work, we will extend this approach to other use cases to disseminate the usage of LDHs in a larger research community.


Assuntos
Metadados , Humanos , Disseminação de Informação/métodos , Armazenamento e Recuperação da Informação/métodos
13.
Front Med (Lausanne) ; 11: 1396459, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39257886

RESUMO

The Cancer of Unknown Primary (CUP) syndrome is characterized by identifiable metastases while the primary tumor remains hidden. In recent years, various data-driven approaches have been suggested to predict the location of the primary tumor (LOP) in CUP patients promising improved diagnosis and outcome. These LOP prediction approaches use high-dimensional input data like images or genetic data. However, leveraging such data is challenging, resource-intensive and therefore a potential translational barrier. Instead of using high-dimensional data, we analyzed the LOP prediction performance of low-dimensional data from routine medical care. With our findings, we show that such low-dimensional routine clinical information suffices as input data for tree-based LOP prediction models. The best model reached a mean Accuracy of 94% and a mean Matthews correlation coefficient (MCC) score of 0.92 in 10-fold nested cross-validation (NCV) when distinguishing four types of cancer. When considering eight types of cancer, this model achieved a mean Accuracy of 85% and a mean MCC score of 0.81. This is comparable to the performance achieved by approaches using high-dimensional input data. Additionally, the distribution pattern of metastases appears to be important information in predicting the LOP.

14.
Stud Health Technol Inform ; 302: 125-126, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203623

RESUMO

Developing smart clinical decision support systems requires integrating data from several medical departments. This short paper outlines the challenges we faced in cross-departmental data integration for an oncological use case. Most severely, they have led to a significant reduction in case numbers. Only 2,77% of cases meeting the initial inclusion criteria of the use case were present in all accessed data sources.


Assuntos
Informática Médica , Integração de Sistemas , Oncologia
15.
Stud Health Technol Inform ; 302: 1027-1028, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203572

RESUMO

Supervised methods, such as those utilized in classification, prediction, and segmentation tasks for medical images, experience a decline in performance when the training and testing datasets violate the i.i.d (independent and identically distributed) assumption. Hence we adopted the CycleGAN(Generative Adversarial Networks) method to cycle training the CT(Computer Tomography) data from different terminals/manufacturers, which aims to eliminate the distribution shift from diverse data terminals. But due to the model collapse problem of the GAN-based model, the images we generated suffer serious radiology artifacts. To eliminate the boundary marks and artifacts, we adopted a score-based generative model to refine the images voxel-wisely. This novel combination of two generative models makes the transformation between diverse data providers to a higher fidelity level without sacrificing any significant features. In future works, we will evaluate the original datasets and generative datasets by experimenting with a broader range of supervised methods.


Assuntos
Processamento de Imagem Assistida por Computador , Tomografia Computadorizada por Raios X , Processamento de Imagem Assistida por Computador/métodos , Radiografia , Artefatos
16.
Stud Health Technol Inform ; 302: 43-47, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203606

RESUMO

FHIR is a widely accepted interoperability standard for exchanging medical data, but data transformation from the primary health information systems into FHIR is usually challenging and requires advanced technical skills and infrastructure. There is a critical need for low-cost solutions, and using Mirth Connect as an open-source tool provides this opportunity. We developed a reference implementation to transform data from CSV (the most common data format) into FHIR resources using Mirth Connect without any advanced technical resources or programming skills. This reference implementation is tested successfully for both quality and performance, and it enables reproducing and improving the implemented approach by healthcare providers to transform raw data into FHIR resources. For ensuring replicability, the used channel, mapping, and templates are available publicly on GitHub (https://github.com/alkarkoukly/CSV-FHIR-Transformer).


Assuntos
Sistemas de Informação em Saúde , Software , Registros Eletrônicos de Saúde , Nível Sete de Saúde
17.
Front Med (Lausanne) ; 10: 1305415, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38259836

RESUMO

The growing interest in data-driven medicine, in conjunction with the formation of initiatives such as the European Health Data Space (EHDS) has demonstrated the need for methodologies that are capable of facilitating privacy-preserving data analysis. Distributed Analytics (DA) as an enabler for privacy-preserving analysis across multiple data sources has shown its potential to support data-intensive research. However, the application of DA creates new challenges stemming from its distributed nature, such as identifying single points of failure (SPOFs) in DA tasks before their actual execution. Failing to detect such SPOFs can, for example, result in improper termination of the DA code, necessitating additional efforts from multiple stakeholders to resolve the malfunctions. Moreover, these malfunctions disrupt the seamless conduct of DA and entail several crucial consequences, including technical obstacles to resolve the issues, potential delays in research outcomes, and increased costs. In this study, we address this challenge by introducing a concept based on a method called Smoke Testing, an initial and foundational test run to ensure the operability of the analysis code. We review existing DA platforms and systematically extract six specific Smoke Testing criteria for DA applications. With these criteria in mind, we create an interactive environment called Development Environment for AuTomated and Holistic Smoke Testing of Analysis-Runs (DEATHSTAR), which allows researchers to perform Smoke Tests on their DA experiments. We conduct a user-study with 29 participants to assess our environment and additionally apply it to three real use cases. The results of our evaluation validate its effectiveness, revealing that 96.6% of the analyses created and (Smoke) tested by participants using our approach successfully terminated without any errors. Thus, by incorporating Smoke Testing as a fundamental method, our approach helps identify potential malfunctions early in the development process, ensuring smoother data-driven research within the scope of DA. Through its flexibility and adaptability to diverse real use cases, our solution enables more robust and efficient development of DA experiments, which contributes to their reliability.

18.
Comput Methods Programs Biomed ; 242: 107814, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37722311

RESUMO

BACKGROUND AND OBJECTIVE: The Oxford Classification for IgA nephropathy is the most successful example of an evidence-based nephropathology classification system. The aim of our study was to replicate the glomerular components of Oxford scoring with an end-to-end deep learning pipeline that involves automatic glomerular segmentation followed by classification for mesangial hypercellularity (M), endocapillary hypercellularity (E), segmental sclerosis (S) and active crescents (C). METHODS: A total number of 1056 periodic acid-Schiff (PAS) whole slide images (WSIs), coming from 386 kidney biopsies, were annotated. Several detection models for glomeruli, based on the Mask R-CNN architecture, were trained on 587 WSIs, validated on 161 WSIs, and tested on 127 WSIs. For the development of segmentation models, 20,529 glomeruli were annotated, of which 16,571 as training and 3958 as validation set. The test set of the segmentation module comprised of 2948 glomeruli. For the Oxford classification, 6206 expert-annotated glomeruli from 308 PAS WSIs were labelled for M, E, S, C and split into a training set of 4298 glomeruli from 207 WSIs, and a test set of 1908 glomeruli. We chose the best-performing models to construct an end-to-end pipeline, which we named MESCnn (MESC classification by neural network), for the glomerular Oxford classification of WSIs. RESULTS: Instance segmentation yielded excellent results with an AP50 ranging between 78.2-80.1 % (79.4 ± 0.7 %) on the validation and 75.1-77.7 % (76.5 ± 0.9 %) on the test set. The aggregated Jaccard Index was between 73.4-75.9 % (75.0 ± 0.8 %) on the validation and 69.1-73.4 % (72.2 ± 1.4 %) on the test set. At granular glomerular level, Oxford Classification was best replicated for M with EfficientNetV2-L with a mean ROC-AUC of 90.2 % and a mean precision/recall area under the curve (PR-AUC) of 81.8 %, best for E with MobileNetV2 (ROC-AUC 94.7 %) and ResNet50 (PR-AUC 75.8 %), best for S with EfficientNetV2-M (mean ROC-AUC 92.7 %, mean PR-AUC 87.7 %), best for C with EfficientNetV2-L (ROC-AUC 92.3 %) and EfficientNetV2-S (PR-AUC 54.7 %). At biopsy-level, correlation between expert and deep learning labels fulfilled the demands of the Oxford Classification. CONCLUSION: We designed an end-to-end pipeline for glomerular Oxford Classification on both a granular glomerular and an entire biopsy level. Both the glomerular segmentation and the classification modules are freely available for further development to the renal medicine community.


Assuntos
Aprendizado Profundo , Glomerulonefrite por IGA , Humanos , Glomerulonefrite por IGA/diagnóstico , Glomerulonefrite por IGA/patologia , Taxa de Filtração Glomerular , Glomérulos Renais/patologia , Rim/diagnóstico por imagem
19.
Artigo em Inglês | MEDLINE | ID: mdl-32750845

RESUMO

The study of genetic variants (GVs) can help find correlating population groups and to identify cohorts that are predisposed to common diseases and explain differences in disease susceptibility and how patients react to drugs. Machine learning techniques are increasingly being applied to identify interacting GVs to understand their complex phenotypic traits. Since the performance of a learning algorithm not only depends on the size and nature of the data but also on the quality of underlying representation, deep neural networks (DNNs) can learn non-linear mappings that allow transforming GVs data into more clustering and classification friendly representations than manual feature selection. In this paper, we propose convolutional embedded networks (CEN) in which we combine two DNN architectures called convolutional embedded clustering (CEC) and convolutional autoencoder (CAE) classifier for clustering individuals and predicting geographic ethnicity based on GVs, respectively. We employed CAE-based representation learning to 95 million GVs from the '1000 genomes' (covering 2,504 individuals from 26 ethnic origins) and 'Simons genome diversity' (covering 279 individuals from 130 ethnic origins) projects. Quantitative and qualitative analyses with a focus on accuracy and scalability show that our approach outperforms state-of-the-art approaches such as VariantSpark and ADMIXTURE. In particular, CEC can cluster targeted population groups in 22 hours with an adjusted rand index (ARI) of 0.915, the normalized mutual information (NMI) of 0.92, and the clustering accuracy (ACC) of 89 percent. Contrarily, the CAE classifier can predict the geographic ethnicity of unknown samples with an F1 and Mathews correlation coefficient (MCC) score of 0.9004 and 0.8245, respectively. Further, to provide interpretations of the predictions, we identify significant biomarkers using gradient boosted trees (GBT) and SHapley Additive exPlanations (SHAP). Overall, our approach is transparent and faster than the baseline methods, and scalable for 5 to 100 percent of the full human genome.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Algoritmos , Análise por Conglomerados , Humanos
20.
Stud Health Technol Inform ; 290: 22-26, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35672963

RESUMO

Medical data science aims to facilitate knowledge discovery assisting in data, algorithms, and results analysis. The FAIR principles aim to guide scientific data management and stewardship, and are relevant to all digital health ecosystem stakeholders. The FAIR4Health project aims to facilitate and encourage the health research community to reuse datasets derived from publicly funded research initiatives using the FAIR principles. The 'FAIRness for FHIR' project aims to provide guidance on how HL7 FHIR could be utilized as a common data model to support the health datasets FAIRification process. This first expected result is an HL7 FHIR Implementation Guide (IG) called FHIR4FAIR, covering how FHIR can be used to cover FAIRification in different scenarios. This IG aims to provide practical underpinnings for the FAIR4Health FAIRification workflow as a domain-specific extension of the GoFAIR process, while simplifying curation, advancing interoperability, and providing insights into a roadmap for health datasets FAIR certification.


Assuntos
Registros Eletrônicos de Saúde , Nível Sete de Saúde , Gerenciamento de Dados , Ecossistema , Fluxo de Trabalho
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA