Búsqueda | Biblioteca Virtual en Salud Odontología. Uruguay

Federated unsupervised random forest for privacy-preserving patient stratification.

Pfeifer, Bastian; Sirocchi, Christel; Bloice, Marcus D; Kreuzthaler, Markus; Urschler, Martin.

Bioinformatics ; 40(Supplement_2): ii198-ii207, 2024 Sep 01.

Artículo en Inglés | MEDLINE | ID: mdl-39230698

RESUMEN

MOTIVATION: In the realm of precision medicine, effective patient stratification and disease subtyping demand innovative methodologies tailored for multi-omics data. Clustering techniques applied to multi-omics data have become instrumental in identifying distinct subgroups of patients, enabling a finer-grained understanding of disease variability. Meanwhile, clinical datasets are often small and must be aggregated from multiple hospitals. Online data sharing, however, is seen as a significant challenge due to privacy concerns, potentially impeding big data's role in medical advancements using machine learning. This work establishes a powerful framework for advancing precision medicine through unsupervised random forest-based clustering in combination with federated computing. RESULTS: We introduce a novel multi-omics clustering approach utilizing unsupervised random forests. The unsupervised nature of the random forest enables the determination of cluster-specific feature importance, unraveling key molecular contributors to distinct patient groups. Our methodology is designed for federated execution, a crucial aspect in the medical domain where privacy concerns are paramount. We have validated our approach on machine learning benchmark datasets as well as on cancer data from The Cancer Genome Atlas. Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability. Experiments indicate that local clustering performance can be improved through federated computing. AVAILABILITY AND IMPLEMENTATION: The proposed methods are available as an R-package (https://github.com/pievos101/uRF).

Asunto(s)

Medicina de Precisión , Humanos , Análisis por Conglomerados , Medicina de Precisión/métodos , Aprendizaje Automático no Supervisado , Aprendizaje Automático , Neoplasias , Privacidad , Algoritmos , Bosques Aleatorios

Medical-informed machine learning: integrating prior knowledge into medical decision systems.

Sirocchi, Christel; Bogliolo, Alessandro; Montagna, Sara.

BMC Med Inform Decis Mak ; 24(Suppl 4): 186, 2024 Jun 28.

Artículo en Inglés | MEDLINE | ID: mdl-38943085

RESUMEN

BACKGROUND: Clinical medicine offers a promising arena for applying Machine Learning (ML) models. However, despite numerous studies employing ML in medical data analysis, only a fraction have impacted clinical care. This article underscores the importance of utilising ML in medical data analysis, recognising that ML alone may not adequately capture the full complexity of clinical data, thereby advocating for the integration of medical domain knowledge in ML. METHODS: The study conducts a comprehensive review of prior efforts in integrating medical knowledge into ML and maps these integration strategies onto the phases of the ML pipeline, encompassing data pre-processing, feature engineering, model training, and output evaluation. The study further explores the significance and impact of such integration through a case study on diabetes prediction. Here, clinical knowledge, encompassing rules, causal networks, intervals, and formulas, is integrated at each stage of the ML pipeline, resulting in a spectrum of integrated models. RESULTS: The findings highlight the benefits of integration in terms of accuracy, interpretability, data efficiency, and adherence to clinical guidelines. In several cases, integrated models outperformed purely data-driven approaches, underscoring the potential for domain knowledge to enhance ML models through improved generalisation. In other cases, the integration was instrumental in enhancing model interpretability and ensuring conformity with established clinical guidelines. Notably, knowledge integration also proved effective in maintaining performance under limited data scenarios. CONCLUSIONS: By illustrating various integration strategies through a clinical case study, this work provides guidance to inspire and facilitate future integration efforts. Furthermore, the study identifies the need to refine domain knowledge representation and fine-tune its contribution to the ML model as the two main challenges to integration and aims to stimulate further research in this direction.

Asunto(s)

Sistemas de Apoyo a Decisiones Clínicas , Aprendizaje Automático , Humanos

Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments.

Selega, Alina; Sirocchi, Christel; Iosub, Ira; Granneman, Sander; Sanguinetti, Guido.

Nat Methods ; 14(1): 83-89, 2017 01.

Artículo en Inglés | MEDLINE | ID: mdl-27819660

RESUMEN

Structure probing coupled with high-throughput sequencing could revolutionize our understanding of the role of RNA structure in regulation of gene expression. Despite recent technological advances, intrinsic noise and high sequence coverage requirements greatly limit the applicability of these techniques. Here we describe a probabilistic modeling pipeline that accounts for biological variability and biases in the data, yielding statistically interpretable scores for the probability of nucleotide modification transcriptome wide. Using two yeast data sets, we demonstrate that our method has increased sensitivity, and thus our pipeline identifies modified regions on many more transcripts than do existing pipelines. Our method also provides confident predictions at much lower sequence coverage levels than those recommended for reliable structural probing. Our results show that statistical modeling extends the scope and potential of transcriptome-wide structure probing experiments.

Asunto(s)

Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Estadísticos , ARN/química , ARN/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma/genética , Emparejamiento Base , Secuencia de Bases , Biología Computacional/métodos , Humanos , Conformación de Ácido Nucleico

Exploring machine learning for untargeted metabolomics using molecular fingerprints.

Sirocchi, Christel; Biancucci, Federica; Donati, Matteo; Bogliolo, Alessandro; Magnani, Mauro; Menotta, Michele; Montagna, Sara.

Comput Methods Programs Biomed ; 250: 108163, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38626559

RESUMEN

BACKGROUND: Metabolomics, the study of substrates and products of cellular metabolism, offers valuable insights into an organism's state under specific conditions and has the potential to revolutionise preventive healthcare and pharmaceutical research. However, analysing large metabolomics datasets remains challenging, with available methods relying on limited and incompletely annotated metabolic pathways. METHODS: This study, inspired by well-established methods in drug discovery, employs machine learning on metabolite fingerprints to explore the relationship of their structure with responses in experimental conditions beyond known pathways, shedding light on metabolic processes. It evaluates fingerprinting effectiveness in representing metabolites, addressing challenges like class imbalance, data sparsity, high dimensionality, duplicate structural encoding, and interpretable features. Feature importance analysis is then applied to reveal key chemical configurations affecting classification, identifying related metabolite groups. RESULTS: The approach is tested on two datasets: one on Ataxia Telangiectasia and another on endothelial cells under low oxygen. Machine learning on molecular fingerprints predicts metabolite responses effectively, and feature importance analysis aligns with known metabolic pathways, unveiling new affected metabolite groups for further study. CONCLUSION: In conclusion, the presented approach leverages the strengths of drug discovery to address critical issues in metabolomics research and aims to bridge the gap between these two disciplines. This work lays the foundation for future research in this direction, possibly exploring alternative structural encodings and machine learning models.

Asunto(s)

Aprendizaje Automático , Metabolómica , Metabolómica/métodos , Humanos , Línea Celular , Ataxia Telangiectasia/metabolismo , Hipoxia de la Célula/fisiología

Topological network features determine convergence rate of distributed average algorithms.

Sirocchi, Christel; Bogliolo, Alessandro.

Sci Rep ; 12(1): 21831, 2022 Dec 17.

Artículo en Inglés | MEDLINE | ID: mdl-36528734

RESUMEN

Gossip algorithms are message-passing schemes designed to compute averages and other global functions over networks through asynchronous and randomised pairwise interactions. Gossip-based protocols have drawn much attention for achieving robust and fault-tolerant communication while maintaining simplicity and scalability. However, the frequent propagation of redundant information makes them inefficient and resource-intensive. Most previous works have been devoted to deriving performance bounds and developing faster algorithms tailored to specific structures. In contrast, this study focuses on characterising the effect of topological network features on performance so that faster convergence can be engineered by acting on the underlying network rather than the gossip algorithm. The numerical experiments identify the topological limiting factors, the most predictive graph metrics, and the most efficient algorithms for each graph family and for all graphs, providing guidelines for designing and maintaining resource-efficient networks. Regression analyses confirm the explanatory power of structural features and demonstrate the validity of the topological approach in performance estimation. Finally, the high predictive capabilities of local metrics and the possibility of computing them in a distributed manner and at a low computational cost inform the design and implementation of a novel distributed approach for predicting performance from the network topology.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA