Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 98
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38446741

RESUMEN

Identifying protein-protein interactions (PPIs) is crucial for deciphering biological pathways. Numerous prediction methods have been developed as cheap alternatives to biological experiments, reporting surprisingly high accuracy estimates. We systematically investigated how much reproducible deep learning models depend on data leakage, sequence similarities and node degree information, and compared them with basic machine learning models. We found that overlaps between training and test sets resulting from random splitting lead to strongly overestimated performances. In this setting, models learn solely from sequence similarities and node degrees. When data leakage is avoided by minimizing sequence similarities between training and test set, performances become random. Moreover, baseline models directly leveraging sequence similarity and network topology show good performances at a fraction of the computational cost. Thus, we advocate that any improvements should be reported relative to baseline methods in the future. Our findings suggest that predicting PPIs remains an unsolved task for proteins showing little sequence similarity to previously studied proteins, highlighting that further experimental research into the 'dark' protein interactome and better computational methods are needed.


Asunto(s)
Aprendizaje Automático
2.
Nature ; 579(7799): 409-414, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-32188942

RESUMEN

Plants are essential for life and are extremely diverse organisms with unique molecular capabilities1. Here we present a quantitative atlas of the transcriptomes, proteomes and phosphoproteomes of 30 tissues of the model plant Arabidopsis thaliana. Our analysis provides initial answers to how many genes exist as proteins (more than 18,000), where they are expressed, in which approximate quantities (a dynamic range of more than six orders of magnitude) and to what extent they are phosphorylated (over 43,000 sites). We present examples of how the data may be used, such as to discover proteins that are translated from short open-reading frames, to uncover sequence motifs that are involved in the regulation of protein production, and to identify tissue-specific protein complexes or phosphorylation-mediated signalling events. Interactive access to this resource for the plant community is provided by the ProteomicsDB and ATHENA databases, which include powerful bioinformatics tools to explore and characterize Arabidopsis proteins, their modifications and interactions.


Asunto(s)
Proteínas de Arabidopsis/análisis , Proteínas de Arabidopsis/química , Arabidopsis/química , Espectrometría de Masas , Proteoma/análisis , Proteoma/química , Proteómica , Secuencias de Aminoácidos , Arabidopsis/anatomía & histología , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/biosíntesis , Proteínas de Arabidopsis/genética , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto , Regulación de la Expresión Génica de las Plantas , Anotación de Secuencia Molecular , Sistemas de Lectura Abierta , Especificidad de Órganos , Fosfoproteínas/análisis , Fosfoproteínas/química , Fosfoproteínas/genética , Fosforilación , Proteoma/biosíntesis , Proteoma/genética , ARN Mensajero/análisis , ARN Mensajero/biosíntesis , ARN Mensajero/genética , Transcriptoma
3.
Nucleic Acids Res ; 52(W1): W481-W488, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38783119

RESUMEN

In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.


Asunto(s)
Reposicionamiento de Medicamentos , Programas Informáticos , Reposicionamiento de Medicamentos/métodos , Humanos , Internet , Descubrimiento de Drogas/métodos , Biología de Sistemas/métodos , Biología Computacional/métodos
4.
Brief Bioinform ; 24(5)2023 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-37670505

RESUMEN

A key problem in systems biology is the discovery of regulatory mechanisms that drive phenotypic behaviour of complex biological systems in the form of multi-level networks. Modern multi-omics profiling techniques probe these fundamental regulatory networks but are often hampered by experimental restrictions leading to missing data or partially measured omics types for subsets of individuals due to cost restrictions. In such scenarios, in which missing data is present, classical computational approaches to infer regulatory networks are limited. In recent years, approaches have been proposed to infer sparse regression models in the presence of missing information. Nevertheless, these methods have not been adopted for regulatory network inference yet. In this study, we integrated regression-based methods that can handle missingness into KiMONo, a Knowledge guided Multi-Omics Network inference approach, and benchmarked their performance on commonly encountered missing data scenarios in single- and multi-omics studies. Overall, two-step approaches that explicitly handle missingness performed best for a wide range of random- and block-missingness scenarios on imbalanced omics-layers dimensions, while methods implicitly handling missingness performed best on balanced omics-layers dimensions. Our results show that robust multi-omics network inference in the presence of missing data with KiMONo is feasible and thus allows users to leverage available multi-omics data to its full extent.


Asunto(s)
Benchmarking , Multiómica , Humanos , Biología de Sistemas
5.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34850807

RESUMEN

Cytometry techniques are widely used to discover cellular characteristics at single-cell resolution. Many data analysis methods for cytometry data focus solely on identifying subpopulations via clustering and testing for differential cell abundance. For differential expression analysis of markers between conditions, only few tools exist. These tools either reduce the data distribution to medians, discarding valuable information, or have underlying assumptions that may not hold for all expression patterns. Here, we systematically evaluated existing and novel approaches for differential expression analysis on real and simulated CyTOF data. We found that methods using median marker expressions compute fast and reliable results when the data are not strongly zero-inflated. Methods using all data detect changes in strongly zero-inflated markers, but partially suffer from overprediction or cannot handle big datasets. We present a new method, CyEMD, based on calculating the earth mover's distance between expression distributions that can handle strong zero-inflation without being too sensitive. Additionally, we developed CYANUS - CYtometry ANalysis Using Shiny - a user-friendly R Shiny App allowing the user to analyze cytometry data with state-of-the-art tools, including well-performing methods from our comparison. A public web interface is available at https://exbio.wzw.tum.de/cyanus/.


Asunto(s)
Análisis por Conglomerados , Biomarcadores
6.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36579860

RESUMEN

MOTIVATION: During disease progression or organism development, alternative splicing may lead to isoform switches that demonstrate similar temporal patterns and reflect the alternative splicing co-regulation of such genes. Tools for dynamic process analysis usually neglect alternative splicing. RESULTS: Here, we propose Spycone, a splicing-aware framework for time course data analysis. Spycone exploits a novel IS detection algorithm and offers downstream analysis such as network and gene set enrichment. We demonstrate the performance of Spycone using simulated and real-world data of SARS-CoV-2 infection. AVAILABILITY AND IMPLEMENTATION: The Spycone package is available as a PyPI package. The source code of Spycone is available under the GPLv3 license at https://github.com/yollct/spycone and the documentation at https://spycone.readthedocs.io/en/latest/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Empalme Alternativo , COVID-19 , Humanos , SARS-CoV-2/genética , Programas Informáticos , Algoritmos
7.
Bioinformatics ; 39(6)2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37233198

RESUMEN

SUMMARY: We present ROBUST-Web which implements our recently presented ROBUST disease module mining algorithm in a user-friendly web application. ROBUST-Web features seamless downstream disease module exploration via integrated gene set enrichment analysis, tissue expression annotation, and visualization of drug-protein and disease-gene links. Moreover, ROBUST-Web includes bias-aware edge costs for the underlying Steiner tree model as a new algorithmic feature, which allow to correct for study bias in protein-protein interaction networks and further improves the robustness of the computed modules. AVAILABILITY AND IMPLEMENTATION: Web application: https://robust-web.net. Source code of web application and Python package with new bias-aware edge costs: https://github.com/bionetslab/robust-web, https://github.com/bionetslab/robust_bias_aware.


Asunto(s)
Algoritmos , Programas Informáticos , Mapas de Interacción de Proteínas
8.
Bioinformatics ; 39(5)2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-37084275

RESUMEN

MOTIVATION: Cancer is one of the leading causes of death worldwide. Despite significant improvements in prevention and treatment, mortality remains high for many cancer types. Hence, innovative methods that use molecular data to stratify patients and identify biomarkers are needed. Promising biomarkers can also be inferred from competing endogenous RNA (ceRNA) networks that capture the gene-miRNA gene regulatory landscape. Thus far, the role of these biomarkers could only be studied globally but not in a sample-specific manner. To mitigate this, we introduce spongEffects, a novel method that infers subnetworks (or modules) from ceRNA networks and calculates patient- or sample-specific scores related to their regulatory activity. RESULTS: We show how spongEffects can be used for downstream interpretation and machine learning tasks such as tumor classification and for identifying subtype-specific regulatory interactions. In a concrete example of breast cancer subtype classification, we prioritize modules impacting the biology of the different subtypes. In summary, spongEffects prioritizes ceRNA modules as biomarkers and offers insights into the miRNA regulatory landscape. Notably, these module scores can be inferred from gene expression data alone and can thus be applied to cohorts where miRNA expression information is lacking. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/devel/bioc/html/SPONGE.html.


Asunto(s)
Neoplasias de la Mama , MicroARNs , ARN Largo no Codificante , Humanos , Femenino , MicroARNs/genética , MicroARNs/metabolismo , Redes Reguladoras de Genes , Neoplasias de la Mama/genética , Aprendizaje Automático , Regulación Neoplásica de la Expresión Génica , ARN Largo no Codificante/genética
9.
Platelets ; 35(1): 2358244, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38845541

RESUMEN

Thromboembolic events are common in patients with essential thrombocythemia (ET). However, the pathophysiological mechanisms underlying the increased thrombotic risk remain to be determined. Here, we perform the first phenotypical characterization of platelet expression using single-cell mass cytometry in six ET patients and six age- and sex-matched healthy individuals. A large panel of 18 transmembrane regulators of platelet function and activation were analyzed, at baseline and after ex-vivo stimulation with thrombin receptor-activating peptide (TRAP). We detected a significant overexpression of the activation marker CD62P (p-Selectin) (p = .049) and the collagen receptor GPVI (p = .044) in non-stimulated ET platelets. In contrast, ET platelets had a lower expression of the integrin subunits of the fibrinogen receptor GPIIb/IIIa CD41 (p = .036) and CD61 (p = .044) and of the von Willebrand factor receptor CD42b (p = .044). Using the FlowSOM algorithm, we identified 2 subclusters of ET platelets with a prothrombotic expression profile, one of them (cluster 3) significantly overrepresented in ET (22.13% of the total platelets in ET, 2.94% in controls, p = .035). Platelet counts were significantly increased in ET compared to controls (p = .0123). In ET, MPV inversely correlated with platelet count (r=-0.96). These data highlight the prothrombotic phenotype of ET and postulate GPVI as a potential target to prevent thrombosis in these patients.


Essential thrombocythemia (ET) is a rare disease characterized by an increased number of platelets in the blood. As a complication, many of these patients develop a blood clot, which can be life-threatening. So far, the reason behind the higher risk of blood clots is unclear. In this study, we analyzed platelet surface markers that play a critical role in platelet function and platelet activation using a modern technology called mass cytometry. For this purpose, blood samples from 6 patients with ET and 6 healthy control individuals were analyzed. We found significant differences between ET platelets and healthy platelets. ET platelets had higher expression levels of p-Selectin (CD62P), a key marker of platelet activation, and of the collagen receptor GPVI, which is important for clot formation. These results may be driven by a specific platelet subcluster overrepresented in ET. Other surface markers, such as the fibrinogen receptor GPIIb/IIIa CD41, CD61, and the von Willebrand factor receptor CD42b, were lower expressed in ET platelets. When ET platelets were treated with the clotting factor thrombin (thrombin receptor-activating peptide, TRAP), we found a differential response in platelet activation compared to healthy platelets. In conclusion, our results show an increased activation and clotting potential of ET platelets. The platelet surface protein GPVI may be a potential drug target to prevent abnormal blood clotting in ET patients.


Asunto(s)
Plaquetas , Trombocitemia Esencial , Trombosis , Humanos , Trombocitemia Esencial/metabolismo , Trombocitemia Esencial/complicaciones , Plaquetas/metabolismo , Masculino , Femenino , Trombosis/metabolismo , Trombosis/etiología , Persona de Mediana Edad , Anciano , Citometría de Flujo/métodos , Activación Plaquetaria , Estudios de Casos y Controles , Adulto
10.
Nucleic Acids Res ; 50(W1): W138-W144, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35580047

RESUMEN

Cancer is a heterogeneous disease characterized by unregulated cell growth and promoted by mutations in cancer driver genes some of which encode suitable drug targets. Since the distinct set of cancer driver genes can vary between and within cancer types, evidence-based selection of drugs is crucial for targeted therapy following the precision medicine paradigm. However, many putative cancer driver genes can not be targeted directly, suggesting an indirect approach that considers alternative functionally related targets in the gene interaction network. Once potential drug targets have been identified, it is essential to consider all available drugs. Since tools that offer support for systematic discovery of drug repurposing candidates in oncology are lacking, we developed CADDIE, a web application integrating six human gene-gene and four drug-gene interaction databases, information regarding cancer driver genes, cancer-type specific mutation frequencies, gene expression information, genetically related diseases, and anticancer drugs. CADDIE offers access to various network algorithms for identifying drug targets and drug repurposing candidates. It guides users from the selection of seed genes to the identification of therapeutic targets or drug candidates, making network medicine algorithms accessible for clinical research. CADDIE is available at https://exbio.wzw.tum.de/caddie/ and programmatically via a python package at https://pypi.org/project/caddiepy/.


Asunto(s)
Antineoplásicos , Neoplasias , Humanos , Neoplasias/tratamiento farmacológico , Neoplasias/genética , Antineoplásicos/farmacología , Antineoplásicos/uso terapéutico , Programas Informáticos , Oncogenes , Algoritmos , Mutación , Interacciones Farmacológicas , Reposicionamiento de Medicamentos
11.
Proteomics ; 23(23-24): e2200462, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37706624

RESUMEN

Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type-specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome-wide, gene-unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.


Asunto(s)
Regulación de la Expresión Génica , Factores de Transcripción , Factores de Transcripción/metabolismo , Genoma , Biología Computacional , Redes Reguladoras de Genes
12.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33782690

RESUMEN

In network and systems medicine, active module identification methods (AMIMs) are widely used for discovering candidate molecular disease mechanisms. To this end, AMIMs combine network analysis algorithms with molecular profiling data, most commonly, by projecting gene expression data onto generic protein-protein interaction (PPI) networks. Although active module identification has led to various novel insights into complex diseases, there is increasing awareness in the field that the combination of gene expression data and PPI network is problematic because up-to-date PPI networks have a very small diameter and are subject to both technical and literature bias. In this paper, we report the results of an extensive study where we analyzed for the first time whether widely used AMIMs really benefit from using PPI networks. Our results clearly show that, except for the recently proposed AMIM DOMINO, the tested AMIMs do not produce biologically more meaningful candidate disease modules on widely used PPI networks than on random networks with the same node degrees. AMIMs hence mainly learn from the node degrees and mostly fail to exploit the biological knowledge encoded in the edges of the PPI networks. This has far-reaching consequences for the field of active module identification. In particular, we suggest that novel algorithms are needed which overcome the degree bias of most existing AMIMs and/or work with customized, context-specific networks instead of generic PPI networks.


Asunto(s)
Algoritmos , Expresión Génica , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas/genética , Biología de Sistemas/métodos , Esclerosis Amiotrófica Lateral/genética , Esclerosis Amiotrófica Lateral/metabolismo , Carcinoma de Pulmón de Células no Pequeñas/genética , Carcinoma de Pulmón de Células no Pequeñas/metabolismo , Colitis Ulcerosa/genética , Colitis Ulcerosa/metabolismo , Enfermedad de Crohn/genética , Enfermedad de Crohn/metabolismo , Humanos , Enfermedad de Huntington/genética , Enfermedad de Huntington/metabolismo , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Fenotipo , Proteínas/genética , Proteínas/metabolismo
13.
Brief Bioinform ; 22(2): 642-663, 2021 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-33147627

RESUMEN

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.


Asunto(s)
COVID-19/prevención & control , Biología Computacional , SARS-CoV-2/aislamiento & purificación , Investigación Biomédica , COVID-19/epidemiología , COVID-19/virología , Genoma Viral , Humanos , Pandemias , SARS-CoV-2/genética
14.
Bioinformatics ; 38(Suppl_2): ii141-ii147, 2022 09 16.
Artículo en Inglés | MEDLINE | ID: mdl-36124800

RESUMEN

MOTIVATION: As complex tissues are typically composed of various cell types, deconvolution tools have been developed to computationally infer their cellular composition from bulk RNA sequencing (RNA-seq) data. To comprehensively assess deconvolution performance, gold-standard datasets are indispensable. Gold-standard, experimental techniques like flow cytometry or immunohistochemistry are resource-intensive and cannot be systematically applied to the numerous cell types and tissues profiled with high-throughput transcriptomics. The simulation of 'pseudo-bulk' data, generated by aggregating single-cell RNA-seq expression profiles in pre-defined proportions, offers a scalable and cost-effective alternative. This makes it feasible to create in silico gold standards that allow fine-grained control of cell-type fractions not conceivable in an experimental setup. However, at present, no simulation software for generating pseudo-bulk RNA-seq data exists. RESULTS: We developed SimBu, an R package capable of simulating pseudo-bulk samples based on various simulation scenarios, designed to test specific features of deconvolution methods. A unique feature of SimBu is the modeling of cell-type-specific mRNA bias using experimentally derived or data-driven scaling factors. Here, we show that SimBu can generate realistic pseudo-bulk data, recapitulating the biological and statistical features of real RNA-seq data. Finally, we illustrate the impact of mRNA bias on the evaluation of deconvolution tools and provide recommendations for the selection of suitable methods for estimating mRNA content. SimBu is a user-friendly and flexible tool for simulating realistic pseudo-bulk RNA-seq datasets serving as in silico gold-standard for assessing cell-type deconvolution methods. AVAILABILITY AND IMPLEMENTATION: SimBu is freely available at https://github.com/omnideconv/SimBu as an R package under the GPL-3 license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , ARN , Perfilación de la Expresión Génica/métodos , ARN/genética , ARN Mensajero , RNA-Seq , Análisis de Secuencia de ARN/métodos
15.
Bioinformatics ; 38(6): 1600-1606, 2022 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-34984440

RESUMEN

MOTIVATION: Disease module mining methods (DMMMs) extract subgraphs that constitute candidate disease mechanisms from molecular interaction networks such as protein-protein interaction (PPI) networks. Irrespective of the employed models, DMMMs typically include non-robust steps in their workflows, i.e. the computed subnetworks vary when running the DMMMs multiple times on equivalent input. This lack of robustness has a negative effect on the trustworthiness of the obtained subnetworks and is hence detrimental for the widespread adoption of DMMMs in the biomedical sciences. RESULTS: To overcome this problem, we present a new DMMM called ROBUST (robust disease module mining via enumeration of diverse prize-collecting Steiner trees). In a large-scale empirical evaluation, we show that ROBUST outperforms competing methods in terms of robustness, scalability and, in most settings, functional relevance of the produced modules, measured via KEGG (Kyoto Encyclopedia of Genes and Genomes) gene set enrichment scores and overlap with DisGeNET disease genes. AVAILABILITY AND IMPLEMENTATION: A Python 3 implementation and scripts to reproduce the results reported in this article are available on GitHub: https://github.com/bionetslab/robust, https://github.com/bionetslab/robust-eval. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Árboles , Biología Computacional/métodos , Mapas de Interacción de Proteínas
16.
Nucleic Acids Res ; 49(D1): D309-D318, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-32976589

RESUMEN

Alternative splicing plays a major role in regulating the functional repertoire of the proteome. However, isoform-specific effects to protein-protein interactions (PPIs) are usually overlooked, making it impossible to judge the functional role of individual exons on a systems biology level. We overcome this barrier by integrating protein-protein interactions, domain-domain interactions and residue-level interactions information to lift exon expression analysis to a network level. Our user-friendly database DIGGER is available at https://exbio.wzw.tum.de/digger and allows users to seamlessly switch between isoform and exon-centric views of the interactome and to extract sub-networks of relevant isoforms, making it an essential resource for studying mechanistic consequences of alternative splicing.


Asunto(s)
Empalme Alternativo , Bases de Datos de Proteínas , Exones , Mapeo de Interacción de Proteínas/métodos , Proteoma/química , ARN Mensajero/genética , Sitios de Unión , Biología Computacional/métodos , Humanos , Internet , Modelos Moleculares , Unión Proteica , Biosíntesis de Proteínas , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios y Motivos de Interacción de Proteínas , Isoformas de Proteínas , Proteoma/genética , Proteoma/metabolismo , ARN Mensajero/metabolismo , Programas Informáticos , Termodinámica
17.
J Med Internet Res ; 25: e42621, 2023 07 12.
Artículo en Inglés | MEDLINE | ID: mdl-37436815

RESUMEN

BACKGROUND: Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distributed machine learning models without sharing sensitive data. In addition, the implementation is time-consuming and requires advanced programming skills and complex technical infrastructures. OBJECTIVE: Various tools and frameworks have been developed to simplify the development of FL algorithms and provide the necessary technical infrastructure. Although there are many high-quality frameworks, most focus only on a single application case or method. To our knowledge, there are no generic frameworks, meaning that the existing solutions are restricted to a particular type of algorithm or application field. Furthermore, most of these frameworks provide an application programming interface that needs programming knowledge. There is no collection of ready-to-use FL algorithms that are extendable and allow users (eg, researchers) without programming knowledge to apply FL. A central FL platform for both FL algorithm developers and users does not exist. This study aimed to address this gap and make FL available to everyone by developing FeatureCloud, an all-in-one platform for FL in biomedicine and beyond. METHODS: The FeatureCloud platform consists of 3 main components: a global frontend, a global backend, and a local controller. Our platform uses a Docker to separate the local acting components of the platform from the sensitive data systems. We evaluated our platform using 4 different algorithms on 5 data sets for both accuracy and runtime. RESULTS: FeatureCloud removes the complexity of distributed systems for developers and end users by providing a comprehensive platform for executing multi-institutional FL analyses and implementing FL algorithms. Through its integrated artificial intelligence store, federated algorithms can easily be published and reused by the community. To secure sensitive raw data, FeatureCloud supports privacy-enhancing technologies to secure the shared local models and assures high standards in data privacy to comply with the strict General Data Protection Regulation. Our evaluation shows that applications developed in FeatureCloud can produce highly similar results compared with centralized approaches and scale well for an increasing number of participating sites. CONCLUSIONS: FeatureCloud provides a ready-to-use platform that integrates the development and execution of FL algorithms while reducing the complexity to a minimum and removing the hurdles of federated infrastructure. Thus, we believe that it has the potential to greatly increase the accessibility of privacy-preserving and distributed data analyses in biomedicine and beyond.


Asunto(s)
Algoritmos , Inteligencia Artificial , Humanos , Empleos en Salud , Programas Informáticos , Redes de Comunicación de Computadores , Privacidad
18.
Bioinformatics ; 37(12): 1708-1716, 2021 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-33252645

RESUMEN

MOTIVATION: Recently, various tools for detecting single nucleotide polymorphisms (SNPs) involved in epistasis have been developed. However, no studies evaluate the employed statistical epistasis models such as the χ2-test or quadratic regression independently of the tools that use them. Such an independent evaluation is crucial for developing improved epistasis detection tools, for it allows to decide if a tool's performance should be attributed to the epistasis model or to the optimization strategy run on top of it. RESULTS: We present a protocol for evaluating epistasis models independently of the tools they are used in and generalize existing models designed for dichotomous phenotypes to the categorical and quantitative case. In addition, we propose a new model which scores candidate SNP sets by computing maximum likelihood distributions for the observed phenotypes in the cells of their penetrance tables. Extensive experiments show that the proposed maximum likelihood model outperforms three widely used epistasis models in most cases. The experiments also provide valuable insights into the properties of existing models, for instance, that quadratic regression perform particularly well on instances with quantitative phenotypes. AVAILABILITY AND IMPLEMENTATION: The evaluation protocol and all compared models are implemented in C++ and are supported under Linux and macOS. They are available at https://github.com/baumbachlab/genepiseeker/, along with test datasets and scripts to reproduce the experiments. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Epistasis Genética , Polimorfismo de Nucleótido Simple , Fenotipo , Probabilidad
19.
Bioinformatics ; 37(16): 2398-2404, 2021 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-33367514

RESUMEN

MOTIVATION: Unsupervised learning approaches are frequently used to stratify patients into clinically relevant subgroups and to identify biomarkers such as disease-associated genes. However, clustering and biclustering techniques are oblivious to the functional relationship of genes and are thus not ideally suited to pinpoint molecular mechanisms along with patient subgroups. RESULTS: We developed the network-constrained biclustering approach Biclustering Constrained by Networks (BiCoN) which (i) restricts biclusters to functionally related genes connected in molecular interaction networks and (ii) maximizes the difference in gene expression between two subgroups of patients. This allows BiCoN to simultaneously pinpoint molecular mechanisms responsible for the patient grouping. Network-constrained clustering of genes makes BiCoN more robust to noise and batch effects than typical clustering and biclustering methods. BiCoN can faithfully reproduce known disease subtypes as well as novel, clinically relevant patient subgroups, as we could demonstrate using breast and lung cancer datasets. In summary, BiCoN is a novel systems medicine tool that combines several heuristic optimization strategies for robust disease mechanism extraction. BiCoN is well-documented and freely available as a python package or a web interface. AVAILABILITY AND IMPLEMENTATION: PyPI package: https://pypi.org/project/bicon. WEB INTERFACE: https://exbio.wzw.tum.de/bicon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

20.
Bioinformatics ; 37(18): 3008-3010, 2021 09 29.
Artículo en Inglés | MEDLINE | ID: mdl-33647976

RESUMEN

SUMMARY: A plethora of tools exist for RNA-Seq data analysis with a focus on alternative splicing (AS). However, appropriate data for their comparative evaluation is missing. The R package ASimulatoR simulates gold standard RNA-Seq datasets with fine-grained control over the distribution of AS events, which allow for evaluating alternative splicing tools, e.g. to study the effect of sequencing depth on the performance of AS event detection. AVAILABILITY AND IMPLEMENTATION: ASimulatoR is freely available at https://github.com/biomedbigdata/ASimulatoR as an R package under GPL-3 license.


Asunto(s)
Empalme Alternativo , Programas Informáticos , RNA-Seq , Análisis de Secuencia de ARN , Simulación por Computador
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda