Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 97
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38446741

RESUMO

Identifying protein-protein interactions (PPIs) is crucial for deciphering biological pathways. Numerous prediction methods have been developed as cheap alternatives to biological experiments, reporting surprisingly high accuracy estimates. We systematically investigated how much reproducible deep learning models depend on data leakage, sequence similarities and node degree information, and compared them with basic machine learning models. We found that overlaps between training and test sets resulting from random splitting lead to strongly overestimated performances. In this setting, models learn solely from sequence similarities and node degrees. When data leakage is avoided by minimizing sequence similarities between training and test set, performances become random. Moreover, baseline models directly leveraging sequence similarity and network topology show good performances at a fraction of the computational cost. Thus, we advocate that any improvements should be reported relative to baseline methods in the future. Our findings suggest that predicting PPIs remains an unsolved task for proteins showing little sequence similarity to previously studied proteins, highlighting that further experimental research into the 'dark' protein interactome and better computational methods are needed.


Assuntos
Aprendizado de Máquina
2.
Nature ; 579(7799): 409-414, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32188942

RESUMO

Plants are essential for life and are extremely diverse organisms with unique molecular capabilities1. Here we present a quantitative atlas of the transcriptomes, proteomes and phosphoproteomes of 30 tissues of the model plant Arabidopsis thaliana. Our analysis provides initial answers to how many genes exist as proteins (more than 18,000), where they are expressed, in which approximate quantities (a dynamic range of more than six orders of magnitude) and to what extent they are phosphorylated (over 43,000 sites). We present examples of how the data may be used, such as to discover proteins that are translated from short open-reading frames, to uncover sequence motifs that are involved in the regulation of protein production, and to identify tissue-specific protein complexes or phosphorylation-mediated signalling events. Interactive access to this resource for the plant community is provided by the ProteomicsDB and ATHENA databases, which include powerful bioinformatics tools to explore and characterize Arabidopsis proteins, their modifications and interactions.


Assuntos
Proteínas de Arabidopsis/análise , Proteínas de Arabidopsis/química , Arabidopsis/química , Espectrometria de Massas , Proteoma/análise , Proteoma/química , Proteômica , Motivos de Aminoácidos , Arabidopsis/anatomia & histologia , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/biossíntese , Proteínas de Arabidopsis/genética , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica de Plantas , Anotação de Sequência Molecular , Fases de Leitura Aberta , Especificidade de Órgãos , Fosfoproteínas/análise , Fosfoproteínas/química , Fosfoproteínas/genética , Fosforilação , Proteoma/biossíntese , Proteoma/genética , RNA Mensageiro/análise , RNA Mensageiro/biossíntese , RNA Mensageiro/genética , Transcriptoma
3.
Nucleic Acids Res ; 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38783119

RESUMO

In recent decades, the development of new drugs has become increasingly expensive and inefficient, and the molecular mechanisms of most pharmaceuticals remain poorly understood. In response, computational systems and network medicine tools have emerged to identify potential drug repurposing candidates. However, these tools often require complex installation and lack intuitive visual network mining capabilities. To tackle these challenges, we introduce Drugst.One, a platform that assists specialized computational medicine tools in becoming user-friendly, web-based utilities for drug repurposing. With just three lines of code, Drugst.One turns any systems biology software into an interactive web tool for modeling and analyzing complex protein-drug-disease networks. Demonstrating its broad adaptability, Drugst.One has been successfully integrated with 21 computational systems medicine tools. Available at https://drugst.one, Drugst.One has significant potential for streamlining the drug discovery process, allowing researchers to focus on essential aspects of pharmaceutical treatment research.

4.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37670505

RESUMO

A key problem in systems biology is the discovery of regulatory mechanisms that drive phenotypic behaviour of complex biological systems in the form of multi-level networks. Modern multi-omics profiling techniques probe these fundamental regulatory networks but are often hampered by experimental restrictions leading to missing data or partially measured omics types for subsets of individuals due to cost restrictions. In such scenarios, in which missing data is present, classical computational approaches to infer regulatory networks are limited. In recent years, approaches have been proposed to infer sparse regression models in the presence of missing information. Nevertheless, these methods have not been adopted for regulatory network inference yet. In this study, we integrated regression-based methods that can handle missingness into KiMONo, a Knowledge guided Multi-Omics Network inference approach, and benchmarked their performance on commonly encountered missing data scenarios in single- and multi-omics studies. Overall, two-step approaches that explicitly handle missingness performed best for a wide range of random- and block-missingness scenarios on imbalanced omics-layers dimensions, while methods implicitly handling missingness performed best on balanced omics-layers dimensions. Our results show that robust multi-omics network inference in the presence of missing data with KiMONo is feasible and thus allows users to leverage available multi-omics data to its full extent.


Assuntos
Benchmarking , Multiômica , Humanos , Biologia de Sistemas
5.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34850807

RESUMO

Cytometry techniques are widely used to discover cellular characteristics at single-cell resolution. Many data analysis methods for cytometry data focus solely on identifying subpopulations via clustering and testing for differential cell abundance. For differential expression analysis of markers between conditions, only few tools exist. These tools either reduce the data distribution to medians, discarding valuable information, or have underlying assumptions that may not hold for all expression patterns. Here, we systematically evaluated existing and novel approaches for differential expression analysis on real and simulated CyTOF data. We found that methods using median marker expressions compute fast and reliable results when the data are not strongly zero-inflated. Methods using all data detect changes in strongly zero-inflated markers, but partially suffer from overprediction or cannot handle big datasets. We present a new method, CyEMD, based on calculating the earth mover's distance between expression distributions that can handle strong zero-inflation without being too sensitive. Additionally, we developed CYANUS - CYtometry ANalysis Using Shiny - a user-friendly R Shiny App allowing the user to analyze cytometry data with state-of-the-art tools, including well-performing methods from our comparison. A public web interface is available at https://exbio.wzw.tum.de/cyanus/.


Assuntos
Análise por Conglomerados , Biomarcadores
6.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36579860

RESUMO

MOTIVATION: During disease progression or organism development, alternative splicing may lead to isoform switches that demonstrate similar temporal patterns and reflect the alternative splicing co-regulation of such genes. Tools for dynamic process analysis usually neglect alternative splicing. RESULTS: Here, we propose Spycone, a splicing-aware framework for time course data analysis. Spycone exploits a novel IS detection algorithm and offers downstream analysis such as network and gene set enrichment. We demonstrate the performance of Spycone using simulated and real-world data of SARS-CoV-2 infection. AVAILABILITY AND IMPLEMENTATION: The Spycone package is available as a PyPI package. The source code of Spycone is available under the GPLv3 license at https://github.com/yollct/spycone and the documentation at https://spycone.readthedocs.io/en/latest/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento Alternativo , COVID-19 , Humanos , SARS-CoV-2/genética , Software , Algoritmos
7.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37233198

RESUMO

SUMMARY: We present ROBUST-Web which implements our recently presented ROBUST disease module mining algorithm in a user-friendly web application. ROBUST-Web features seamless downstream disease module exploration via integrated gene set enrichment analysis, tissue expression annotation, and visualization of drug-protein and disease-gene links. Moreover, ROBUST-Web includes bias-aware edge costs for the underlying Steiner tree model as a new algorithmic feature, which allow to correct for study bias in protein-protein interaction networks and further improves the robustness of the computed modules. AVAILABILITY AND IMPLEMENTATION: Web application: https://robust-web.net. Source code of web application and Python package with new bias-aware edge costs: https://github.com/bionetslab/robust-web, https://github.com/bionetslab/robust_bias_aware.


Assuntos
Algoritmos , Software , Mapas de Interação de Proteínas
8.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37084275

RESUMO

MOTIVATION: Cancer is one of the leading causes of death worldwide. Despite significant improvements in prevention and treatment, mortality remains high for many cancer types. Hence, innovative methods that use molecular data to stratify patients and identify biomarkers are needed. Promising biomarkers can also be inferred from competing endogenous RNA (ceRNA) networks that capture the gene-miRNA gene regulatory landscape. Thus far, the role of these biomarkers could only be studied globally but not in a sample-specific manner. To mitigate this, we introduce spongEffects, a novel method that infers subnetworks (or modules) from ceRNA networks and calculates patient- or sample-specific scores related to their regulatory activity. RESULTS: We show how spongEffects can be used for downstream interpretation and machine learning tasks such as tumor classification and for identifying subtype-specific regulatory interactions. In a concrete example of breast cancer subtype classification, we prioritize modules impacting the biology of the different subtypes. In summary, spongEffects prioritizes ceRNA modules as biomarkers and offers insights into the miRNA regulatory landscape. Notably, these module scores can be inferred from gene expression data alone and can thus be applied to cohorts where miRNA expression information is lacking. AVAILABILITY AND IMPLEMENTATION: https://bioconductor.org/packages/devel/bioc/html/SPONGE.html.


Assuntos
Neoplasias da Mama , MicroRNAs , RNA Longo não Codificante , Humanos , Feminino , MicroRNAs/genética , MicroRNAs/metabolismo , Redes Reguladoras de Genes , Neoplasias da Mama/genética , Aprendizado de Máquina , Regulação Neoplásica da Expressão Gênica , RNA Longo não Codificante/genética
9.
Platelets ; 35(1): 2358244, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38845541

RESUMO

Thromboembolic events are common in patients with essential thrombocythemia (ET). However, the pathophysiological mechanisms underlying the increased thrombotic risk remain to be determined. Here, we perform the first phenotypical characterization of platelet expression using single-cell mass cytometry in six ET patients and six age- and sex-matched healthy individuals. A large panel of 18 transmembrane regulators of platelet function and activation were analyzed, at baseline and after ex-vivo stimulation with thrombin receptor-activating peptide (TRAP). We detected a significant overexpression of the activation marker CD62P (p-Selectin) (p = .049) and the collagen receptor GPVI (p = .044) in non-stimulated ET platelets. In contrast, ET platelets had a lower expression of the integrin subunits of the fibrinogen receptor GPIIb/IIIa CD41 (p = .036) and CD61 (p = .044) and of the von Willebrand factor receptor CD42b (p = .044). Using the FlowSOM algorithm, we identified 2 subclusters of ET platelets with a prothrombotic expression profile, one of them (cluster 3) significantly overrepresented in ET (22.13% of the total platelets in ET, 2.94% in controls, p = .035). Platelet counts were significantly increased in ET compared to controls (p = .0123). In ET, MPV inversely correlated with platelet count (r=-0.96). These data highlight the prothrombotic phenotype of ET and postulate GPVI as a potential target to prevent thrombosis in these patients.


Essential thrombocythemia (ET) is a rare disease characterized by an increased number of platelets in the blood. As a complication, many of these patients develop a blood clot, which can be life-threatening. So far, the reason behind the higher risk of blood clots is unclear. In this study, we analyzed platelet surface markers that play a critical role in platelet function and platelet activation using a modern technology called mass cytometry. For this purpose, blood samples from 6 patients with ET and 6 healthy control individuals were analyzed. We found significant differences between ET platelets and healthy platelets. ET platelets had higher expression levels of p-Selectin (CD62P), a key marker of platelet activation, and of the collagen receptor GPVI, which is important for clot formation. These results may be driven by a specific platelet subcluster overrepresented in ET. Other surface markers, such as the fibrinogen receptor GPIIb/IIIa CD41, CD61, and the von Willebrand factor receptor CD42b, were lower expressed in ET platelets. When ET platelets were treated with the clotting factor thrombin (thrombin receptor-activating peptide, TRAP), we found a differential response in platelet activation compared to healthy platelets. In conclusion, our results show an increased activation and clotting potential of ET platelets. The platelet surface protein GPVI may be a potential drug target to prevent abnormal blood clotting in ET patients.


Assuntos
Plaquetas , Trombocitemia Essencial , Trombose , Humanos , Trombocitemia Essencial/metabolismo , Trombocitemia Essencial/complicações , Plaquetas/metabolismo , Masculino , Feminino , Trombose/metabolismo , Trombose/etiologia , Pessoa de Meia-Idade , Idoso , Citometria de Fluxo/métodos , Ativação Plaquetária , Estudos de Casos e Controles , Adulto
10.
Nucleic Acids Res ; 50(W1): W138-W144, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35580047

RESUMO

Cancer is a heterogeneous disease characterized by unregulated cell growth and promoted by mutations in cancer driver genes some of which encode suitable drug targets. Since the distinct set of cancer driver genes can vary between and within cancer types, evidence-based selection of drugs is crucial for targeted therapy following the precision medicine paradigm. However, many putative cancer driver genes can not be targeted directly, suggesting an indirect approach that considers alternative functionally related targets in the gene interaction network. Once potential drug targets have been identified, it is essential to consider all available drugs. Since tools that offer support for systematic discovery of drug repurposing candidates in oncology are lacking, we developed CADDIE, a web application integrating six human gene-gene and four drug-gene interaction databases, information regarding cancer driver genes, cancer-type specific mutation frequencies, gene expression information, genetically related diseases, and anticancer drugs. CADDIE offers access to various network algorithms for identifying drug targets and drug repurposing candidates. It guides users from the selection of seed genes to the identification of therapeutic targets or drug candidates, making network medicine algorithms accessible for clinical research. CADDIE is available at https://exbio.wzw.tum.de/caddie/ and programmatically via a python package at https://pypi.org/project/caddiepy/.


Assuntos
Antineoplásicos , Neoplasias , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Software , Oncogenes , Algoritmos , Mutação , Interações Medicamentosas , Reposicionamento de Medicamentos
11.
Proteomics ; 23(23-24): e2200462, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37706624

RESUMO

Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type-specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome-wide, gene-unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.


Assuntos
Regulação da Expressão Gênica , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Genoma , Biologia Computacional , Redes Reguladoras de Genes
12.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33782690

RESUMO

In network and systems medicine, active module identification methods (AMIMs) are widely used for discovering candidate molecular disease mechanisms. To this end, AMIMs combine network analysis algorithms with molecular profiling data, most commonly, by projecting gene expression data onto generic protein-protein interaction (PPI) networks. Although active module identification has led to various novel insights into complex diseases, there is increasing awareness in the field that the combination of gene expression data and PPI network is problematic because up-to-date PPI networks have a very small diameter and are subject to both technical and literature bias. In this paper, we report the results of an extensive study where we analyzed for the first time whether widely used AMIMs really benefit from using PPI networks. Our results clearly show that, except for the recently proposed AMIM DOMINO, the tested AMIMs do not produce biologically more meaningful candidate disease modules on widely used PPI networks than on random networks with the same node degrees. AMIMs hence mainly learn from the node degrees and mostly fail to exploit the biological knowledge encoded in the edges of the PPI networks. This has far-reaching consequences for the field of active module identification. In particular, we suggest that novel algorithms are needed which overcome the degree bias of most existing AMIMs and/or work with customized, context-specific networks instead of generic PPI networks.


Assuntos
Algoritmos , Expressão Gênica , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Biologia de Sistemas/métodos , Esclerose Lateral Amiotrófica/genética , Esclerose Lateral Amiotrófica/metabolismo , Carcinoma Pulmonar de Células não Pequenas/genética , Carcinoma Pulmonar de Células não Pequenas/metabolismo , Colite Ulcerativa/genética , Colite Ulcerativa/metabolismo , Doença de Crohn/genética , Doença de Crohn/metabolismo , Humanos , Doença de Huntington/genética , Doença de Huntington/metabolismo , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Fenótipo , Proteínas/genética , Proteínas/metabolismo
13.
Brief Bioinform ; 22(2): 642-663, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-33147627

RESUMO

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional , SARS-CoV-2/isolamento & purificação , Pesquisa Biomédica , COVID-19/epidemiologia , COVID-19/virologia , Genoma Viral , Humanos , Pandemias , SARS-CoV-2/genética
14.
Bioinformatics ; 38(Suppl_2): ii141-ii147, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36124800

RESUMO

MOTIVATION: As complex tissues are typically composed of various cell types, deconvolution tools have been developed to computationally infer their cellular composition from bulk RNA sequencing (RNA-seq) data. To comprehensively assess deconvolution performance, gold-standard datasets are indispensable. Gold-standard, experimental techniques like flow cytometry or immunohistochemistry are resource-intensive and cannot be systematically applied to the numerous cell types and tissues profiled with high-throughput transcriptomics. The simulation of 'pseudo-bulk' data, generated by aggregating single-cell RNA-seq expression profiles in pre-defined proportions, offers a scalable and cost-effective alternative. This makes it feasible to create in silico gold standards that allow fine-grained control of cell-type fractions not conceivable in an experimental setup. However, at present, no simulation software for generating pseudo-bulk RNA-seq data exists. RESULTS: We developed SimBu, an R package capable of simulating pseudo-bulk samples based on various simulation scenarios, designed to test specific features of deconvolution methods. A unique feature of SimBu is the modeling of cell-type-specific mRNA bias using experimentally derived or data-driven scaling factors. Here, we show that SimBu can generate realistic pseudo-bulk data, recapitulating the biological and statistical features of real RNA-seq data. Finally, we illustrate the impact of mRNA bias on the evaluation of deconvolution tools and provide recommendations for the selection of suitable methods for estimating mRNA content. SimBu is a user-friendly and flexible tool for simulating realistic pseudo-bulk RNA-seq datasets serving as in silico gold-standard for assessing cell-type deconvolution methods. AVAILABILITY AND IMPLEMENTATION: SimBu is freely available at https://github.com/omnideconv/SimBu as an R package under the GPL-3 license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , RNA , Perfilação da Expressão Gênica/métodos , RNA/genética , RNA Mensageiro , RNA-Seq , Análise de Sequência de RNA/métodos
15.
Bioinformatics ; 38(6): 1600-1606, 2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-34984440

RESUMO

MOTIVATION: Disease module mining methods (DMMMs) extract subgraphs that constitute candidate disease mechanisms from molecular interaction networks such as protein-protein interaction (PPI) networks. Irrespective of the employed models, DMMMs typically include non-robust steps in their workflows, i.e. the computed subnetworks vary when running the DMMMs multiple times on equivalent input. This lack of robustness has a negative effect on the trustworthiness of the obtained subnetworks and is hence detrimental for the widespread adoption of DMMMs in the biomedical sciences. RESULTS: To overcome this problem, we present a new DMMM called ROBUST (robust disease module mining via enumeration of diverse prize-collecting Steiner trees). In a large-scale empirical evaluation, we show that ROBUST outperforms competing methods in terms of robustness, scalability and, in most settings, functional relevance of the produced modules, measured via KEGG (Kyoto Encyclopedia of Genes and Genomes) gene set enrichment scores and overlap with DisGeNET disease genes. AVAILABILITY AND IMPLEMENTATION: A Python 3 implementation and scripts to reproduce the results reported in this article are available on GitHub: https://github.com/bionetslab/robust, https://github.com/bionetslab/robust-eval. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Árvores , Biologia Computacional/métodos , Mapas de Interação de Proteínas
16.
Nucleic Acids Res ; 49(D1): D309-D318, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-32976589

RESUMO

Alternative splicing plays a major role in regulating the functional repertoire of the proteome. However, isoform-specific effects to protein-protein interactions (PPIs) are usually overlooked, making it impossible to judge the functional role of individual exons on a systems biology level. We overcome this barrier by integrating protein-protein interactions, domain-domain interactions and residue-level interactions information to lift exon expression analysis to a network level. Our user-friendly database DIGGER is available at https://exbio.wzw.tum.de/digger and allows users to seamlessly switch between isoform and exon-centric views of the interactome and to extract sub-networks of relevant isoforms, making it an essential resource for studying mechanistic consequences of alternative splicing.


Assuntos
Processamento Alternativo , Bases de Dados de Proteínas , Éxons , Mapeamento de Interação de Proteínas/métodos , Proteoma/química , RNA Mensageiro/genética , Sítios de Ligação , Biologia Computacional/métodos , Humanos , Internet , Modelos Moleculares , Ligação Proteica , Biossíntese de Proteínas , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Isoformas de Proteínas , Proteoma/genética , Proteoma/metabolismo , RNA Mensageiro/metabolismo , Software , Termodinâmica
17.
J Med Internet Res ; 25: e42621, 2023 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-37436815

RESUMO

BACKGROUND: Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distributed machine learning models without sharing sensitive data. In addition, the implementation is time-consuming and requires advanced programming skills and complex technical infrastructures. OBJECTIVE: Various tools and frameworks have been developed to simplify the development of FL algorithms and provide the necessary technical infrastructure. Although there are many high-quality frameworks, most focus only on a single application case or method. To our knowledge, there are no generic frameworks, meaning that the existing solutions are restricted to a particular type of algorithm or application field. Furthermore, most of these frameworks provide an application programming interface that needs programming knowledge. There is no collection of ready-to-use FL algorithms that are extendable and allow users (eg, researchers) without programming knowledge to apply FL. A central FL platform for both FL algorithm developers and users does not exist. This study aimed to address this gap and make FL available to everyone by developing FeatureCloud, an all-in-one platform for FL in biomedicine and beyond. METHODS: The FeatureCloud platform consists of 3 main components: a global frontend, a global backend, and a local controller. Our platform uses a Docker to separate the local acting components of the platform from the sensitive data systems. We evaluated our platform using 4 different algorithms on 5 data sets for both accuracy and runtime. RESULTS: FeatureCloud removes the complexity of distributed systems for developers and end users by providing a comprehensive platform for executing multi-institutional FL analyses and implementing FL algorithms. Through its integrated artificial intelligence store, federated algorithms can easily be published and reused by the community. To secure sensitive raw data, FeatureCloud supports privacy-enhancing technologies to secure the shared local models and assures high standards in data privacy to comply with the strict General Data Protection Regulation. Our evaluation shows that applications developed in FeatureCloud can produce highly similar results compared with centralized approaches and scale well for an increasing number of participating sites. CONCLUSIONS: FeatureCloud provides a ready-to-use platform that integrates the development and execution of FL algorithms while reducing the complexity to a minimum and removing the hurdles of federated infrastructure. Thus, we believe that it has the potential to greatly increase the accessibility of privacy-preserving and distributed data analyses in biomedicine and beyond.


Assuntos
Algoritmos , Inteligência Artificial , Humanos , Ocupações em Saúde , Software , Redes de Comunicação de Computadores , Privacidade
18.
Bioinformatics ; 37(12): 1708-1716, 2021 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-33252645

RESUMO

MOTIVATION: Recently, various tools for detecting single nucleotide polymorphisms (SNPs) involved in epistasis have been developed. However, no studies evaluate the employed statistical epistasis models such as the χ2-test or quadratic regression independently of the tools that use them. Such an independent evaluation is crucial for developing improved epistasis detection tools, for it allows to decide if a tool's performance should be attributed to the epistasis model or to the optimization strategy run on top of it. RESULTS: We present a protocol for evaluating epistasis models independently of the tools they are used in and generalize existing models designed for dichotomous phenotypes to the categorical and quantitative case. In addition, we propose a new model which scores candidate SNP sets by computing maximum likelihood distributions for the observed phenotypes in the cells of their penetrance tables. Extensive experiments show that the proposed maximum likelihood model outperforms three widely used epistasis models in most cases. The experiments also provide valuable insights into the properties of existing models, for instance, that quadratic regression perform particularly well on instances with quantitative phenotypes. AVAILABILITY AND IMPLEMENTATION: The evaluation protocol and all compared models are implemented in C++ and are supported under Linux and macOS. They are available at https://github.com/baumbachlab/genepiseeker/, along with test datasets and scripts to reproduce the experiments. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Epistasia Genética , Polimorfismo de Nucleotídeo Único , Fenótipo , Probabilidade
19.
Bioinformatics ; 37(16): 2398-2404, 2021 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-33367514

RESUMO

MOTIVATION: Unsupervised learning approaches are frequently used to stratify patients into clinically relevant subgroups and to identify biomarkers such as disease-associated genes. However, clustering and biclustering techniques are oblivious to the functional relationship of genes and are thus not ideally suited to pinpoint molecular mechanisms along with patient subgroups. RESULTS: We developed the network-constrained biclustering approach Biclustering Constrained by Networks (BiCoN) which (i) restricts biclusters to functionally related genes connected in molecular interaction networks and (ii) maximizes the difference in gene expression between two subgroups of patients. This allows BiCoN to simultaneously pinpoint molecular mechanisms responsible for the patient grouping. Network-constrained clustering of genes makes BiCoN more robust to noise and batch effects than typical clustering and biclustering methods. BiCoN can faithfully reproduce known disease subtypes as well as novel, clinically relevant patient subgroups, as we could demonstrate using breast and lung cancer datasets. In summary, BiCoN is a novel systems medicine tool that combines several heuristic optimization strategies for robust disease mechanism extraction. BiCoN is well-documented and freely available as a python package or a web interface. AVAILABILITY AND IMPLEMENTATION: PyPI package: https://pypi.org/project/bicon. WEB INTERFACE: https://exbio.wzw.tum.de/bicon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

20.
Bioinformatics ; 37(18): 3008-3010, 2021 09 29.
Artigo em Inglês | MEDLINE | ID: mdl-33647976

RESUMO

SUMMARY: A plethora of tools exist for RNA-Seq data analysis with a focus on alternative splicing (AS). However, appropriate data for their comparative evaluation is missing. The R package ASimulatoR simulates gold standard RNA-Seq datasets with fine-grained control over the distribution of AS events, which allow for evaluating alternative splicing tools, e.g. to study the effect of sequencing depth on the performance of AS event detection. AVAILABILITY AND IMPLEMENTATION: ASimulatoR is freely available at https://github.com/biomedbigdata/ASimulatoR as an R package under GPL-3 license.


Assuntos
Processamento Alternativo , Software , RNA-Seq , Análise de Sequência de RNA , Simulação por Computador
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa