Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
1.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38305455

RESUMO

Novel hypotheses in biomedical research are often developed or validated in model organisms such as mice and zebrafish and thus play a crucial role. However, due to biological differences between species, translating these findings into human applications remains challenging. Moreover, commonly used orthologous gene information is often incomplete and entails a significant information loss during gene-id conversion. To address these issues, we present a novel methodology for species-agnostic transfer learning with heterogeneous domain adaptation. We extended the cross-domain structure-preserving projection toward out-of-sample prediction. Our approach not only allows knowledge integration and translation across various species without relying on gene orthology but also identifies similar GO among the most influential genes composing the latent space for integration. Subsequently, during the alignment of latent spaces, each composed of species-specific genes, it is possible to identify functional annotations of genes missing from public orthology databases. We evaluated our approach with four different single-cell sequencing datasets focusing on cell-type prediction and compared it against related machine-learning approaches. In summary, the developed model outperforms related methods working without prior knowledge when predicting unseen cell types based on other species' data. The results demonstrate that our novel approach allows knowledge transfer beyond species barriers without the dependency on known gene orthology but utilizing the entire gene sets.


Assuntos
Algoritmos , Peixe-Zebra , Camundongos , Humanos , Animais , Peixe-Zebra/genética , Perfilação da Expressão Gênica , Especificidade da Espécie , Aprendizado de Máquina
2.
Bioinformatics ; 40(Suppl 1): i91-i99, 2024 06 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940173

RESUMO

MOTIVATION: High-throughput screens (HTS) provide a powerful tool to decipher the causal effects of chemical and genetic perturbations on cancer cell lines. Their ability to evaluate a wide spectrum of interventions, from single drugs to intricate drug combinations and CRISPR-interference, has established them as an invaluable resource for the development of novel therapeutic approaches. Nevertheless, the combinatorial complexity of potential interventions makes a comprehensive exploration intractable. Hence, prioritizing interventions for further experimental investigation becomes of utmost importance. RESULTS: We propose CODEX (COunterfactual Deep learning for the in silico EXploration of cancer cell line perturbations) as a general framework for the causal modeling of HTS data, linking perturbations to their downstream consequences. CODEX relies on a stringent causal modeling strategy based on counterfactual reasoning. As such, CODEX predicts drug-specific cellular responses, comprising cell survival and molecular alterations, and facilitates the in silico exploration of drug combinations. This is achieved for both bulk and single-cell HTS. We further show that CODEX provides a rationale to explore complex genetic modifications from CRISPR-interference in silico in single cells. AVAILABILITY AND IMPLEMENTATION: Our implementation of CODEX is publicly available at https://github.com/sschrod/CODEX. All data used in this article are publicly available.


Assuntos
Simulação por Computador , Aprendizado Profundo , Humanos , Linhagem Celular Tumoral , Ensaios de Triagem em Larga Escala/métodos , Neoplasias/metabolismo , Biologia Computacional/métodos , Software , Antineoplásicos/farmacologia
3.
Mol Psychiatry ; 29(2): 387-401, 2024 02.
Artigo em Inglês | MEDLINE | ID: mdl-38177352

RESUMO

Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.


Assuntos
Psiquiatria Biológica , Aprendizado de Máquina , Humanos , Psiquiatria Biológica/métodos , Psiquiatria/métodos , Pesquisa Biomédica/métodos
4.
Nucleic Acids Res ; 51(D1): D217-D225, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36453996

RESUMO

MirDIP is a well-established database that aggregates microRNA-gene human interactions from multiple databases to increase coverage, reduce bias, and improve usability by providing an integrated score proportional to the probability of the interaction occurring. In version 5.2, we removed eight outdated resources, added a new resource (miRNATIP), and ran five prediction algorithms for miRBase and mirGeneDB. In total, mirDIP 5.2 includes 46 364 047 predictions for 27 936 genes and 2734 microRNAs, making it the first database to provide interactions using data from mirGeneDB. Moreover, we curated and integrated 32 497 novel microRNAs from 14 publications to accelerate the use of these novel data. In this release, we also extend the content and functionality of mirDIP by associating contexts with microRNAs, genes, and microRNA-gene interactions. We collected and processed microRNA and gene expression data from 20 resources and acquired information on 330 tissue and disease contexts for 2657 microRNAs, 27 576 genes and 123 651 910 gene-microRNA-tissue interactions. Finally, we improved the usability of mirDIP by enabling the user to search the database using precursor IDs, and we integrated miRAnno, a network-based tool for identifying pathways linked to specific microRNAs. We also provide a mirDIP API to facilitate access to its integrated predictions. Updated mirDIP is available at https://ophid.utoronto.ca/mirDIP.


Assuntos
MicroRNAs , Humanos , Algoritmos , Bases de Dados de Ácidos Nucleicos , Epistasia Genética , MicroRNAs/genética , MicroRNAs/metabolismo , Anotação de Sequência Molecular , Curadoria de Dados
5.
BMC Bioinformatics ; 25(1): 171, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38689234

RESUMO

BACKGROUND: Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods. RESULTS: This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models. CONCLUSIONS: Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.


Assuntos
RNA-Seq , Análise da Expressão Gênica de Célula Única , Humanos , Algoritmos , Análise por Conglomerados , Redes Neurais de Computação , RNA-Seq/métodos , Análise da Expressão Gênica de Célula Única/métodos
6.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37988152

RESUMO

SUMMARY: Federated learning enables collaboration in medicine, where data is scattered across multiple centers without the need to aggregate the data in a central cloud. While, in general, machine learning models can be applied to a wide range of data types, graph neural networks (GNNs) are particularly developed for graphs, which are very common in the biomedical domain. For instance, a patient can be represented by a protein-protein interaction (PPI) network where the nodes contain the patient-specific omics features. Here, we present our Ensemble-GNN software package, which can be used to deploy federated, ensemble-based GNNs in Python. Ensemble-GNN allows to quickly build predictive models utilizing PPI networks consisting of various node features such as gene expression and/or DNA methylation. We exemplary show the results from a public dataset of 981 patients and 8469 genes from the Cancer Genome Atlas (TCGA). AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/pievos101/Ensemble-GNN, and the data at Zenodo (DOI: 10.5281/zenodo.8305122).


Assuntos
Metilação de DNA , Aprendizado de Máquina , Humanos , Redes Neurais de Computação , Mapas de Interação de Proteínas , Software
7.
J Biomed Inform ; 150: 104600, 2024 02.
Artigo em Inglês | MEDLINE | ID: mdl-38301750

RESUMO

BACKGROUND: Lack of trust in artificial intelligence (AI) models in medicine is still the key blockage for the use of AI in clinical decision support systems (CDSS). Although AI models are already performing excellently in systems medicine, their black-box nature entails that patient-specific decisions are incomprehensible for the physician. Explainable AI (XAI) algorithms aim to "explain" to a human domain expert, which input features influenced a specific recommendation. However, in the clinical domain, these explanations must lead to some degree of causal understanding by a clinician. RESULTS: We developed the CLARUS platform, aiming to promote human understanding of graph neural network (GNN) predictions. CLARUS enables the visualisation of patient-specific networks, as well as, relevance values for genes and interactions, computed by XAI methods, such as GNNExplainer. This enables domain experts to gain deeper insights into the network and more importantly, the expert can interactively alter the patient-specific network based on the acquired understanding and initiate re-prediction or retraining. This interactivity allows us to ask manual counterfactual questions and analyse the effects on the GNN prediction. CONCLUSION: We present the first interactive XAI platform prototype, CLARUS, that allows not only the evaluation of specific human counterfactual questions based on user-defined alterations of patient networks and a re-prediction of the clinical outcome but also a retraining of the entire GNN after changing the underlying graph structures. The platform is currently hosted by the GWDG on https://rshiny.gwdg.de/apps/clarus/.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Médicos , Humanos , Inteligência Artificial , Redes Neurais de Computação , Algoritmos , Tolnaftato
8.
Nucleic Acids Res ; 50(5): e30, 2022 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-34908135

RESUMO

The use of complex biological molecules to solve computational problems is an emerging field at the interface between biology and computer science. There are two main categories in which biological molecules, especially DNA, are investigated as alternatives to silicon-based computer technologies. One is to use DNA as a storage medium, and the other is to use DNA for computing. Both strategies come with certain constraints. In the current study, we present a novel approach derived from chaos game representation for DNA to generate DNA code words that fulfill user-defined constraints, namely GC content, homopolymers, and undesired motifs, and thus, can be used to build codes for reliable DNA storage systems.


Assuntos
Biologia Computacional/métodos , DNA , Fractais
9.
Brief Bioinform ; 22(2): 642-663, 2021 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-33147627

RESUMO

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The virus causes the infectious disease COVID-19. The biology of coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly for SARS-CoV-2 have only recently been developed as a rapid reaction to the need for fast detection, understanding and treatment of COVID-19. To control the ongoing COVID-19 pandemic, it is of utmost importance to get insight into the evolution and pathogenesis of the virus. In this review, we cover bioinformatics workflows and tools for the routine detection of SARS-CoV-2 infection, the reliable analysis of sequencing data, the tracking of the COVID-19 pandemic and evaluation of containment measures, the study of coronavirus evolution, the discovery of potential drug targets and development of therapeutic strategies. For each tool, we briefly describe its use case and how it advances research specifically for SARS-CoV-2. All tools are free to use and available online, either through web applications or public code repositories. Contact:evbc@unj-jena.de.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional , SARS-CoV-2/isolamento & purificação , Pesquisa Biomédica , COVID-19/epidemiologia , COVID-19/virologia , Genoma Viral , Humanos , Pandemias , SARS-CoV-2/genética
10.
Bioinformatics ; 38(8): 2278-2286, 2022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35139148

RESUMO

MOTIVATION: Limited data access has hindered the field of precision medicine from exploring its full potential, e.g. concerning machine learning and privacy and data protection rules.Our study evaluates the efficacy of federated Random Forests (FRF) models, focusing particularly on the heterogeneity within and between datasets. We addressed three common challenges: (i) number of parties, (ii) sizes of datasets and (iii) imbalanced phenotypes, evaluated on five biomedical datasets. RESULTS: The FRF outperformed the average local models and performed comparably to the data-centralized models trained on the entire data. With an increasing number of models and decreasing dataset size, the performance of local models decreases drastically. The FRF, however, do not decrease significantly. When combining datasets of different sizes, the FRF vastly improve compared to the average local models. We demonstrate that the FRF remain more robust and outperform the local models by analyzing different class-imbalances.Our results support that FRF overcome boundaries of clinical research and enables collaborations across institutes without violating privacy or legal regulations. Clinicians benefit from a vast collection of unbiased data aggregated from different geographic locations, demographics and other varying factors. They can build more generalizable models to make better clinical decisions, which will have relevance, especially for patients in rural areas and rare or geographically uncommon diseases, enabling personalized treatment. In combination with secure multi-party computation, federated learning has the power to revolutionize clinical practice by increasing the accuracy and robustness of healthcare AI and thus paving the way for precision medicine. AVAILABILITY AND IMPLEMENTATION: The implementation of the federated random forests can be found at https://featurecloud.ai/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Privacidade , Algoritmo Florestas Aleatórias , Aprendizado de Máquina , Medicina de Precisão , Atenção à Saúde
11.
Bioinformatics ; 38(2): 325-334, 2022 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-34613360

RESUMO

MOTIVATION: Antimicrobial resistance (AMR) is one of the biggest global problems threatening human and animal health. Rapid and accurate AMR diagnostic methods are thus very urgently needed. However, traditional antimicrobial susceptibility testing (AST) is time-consuming, low throughput and viable only for cultivable bacteria. Machine learning methods may pave the way for automated AMR prediction based on genomic data of the bacteria. However, comparing different machine learning methods for the prediction of AMR based on different encodings and whole-genome sequencing data without previously known knowledge remains to be done. RESULTS: In this study, we evaluated logistic regression (LR), support vector machine (SVM), random forest (RF) and convolutional neural network (CNN) for the prediction of AMR for the antibiotics ciprofloxacin, cefotaxime, ceftazidime and gentamicin. We could demonstrate that these models can effectively predict AMR with label encoding, one-hot encoding and frequency matrix chaos game representation (FCGR encoding) on whole-genome sequencing data. We trained these models on a large AMR dataset and evaluated them on an independent public dataset. Generally, RFs and CNNs perform better than LR and SVM with AUCs up to 0.96. Furthermore, we were able to identify mutations that are associated with AMR for each antibiotic. AVAILABILITY AND IMPLEMENTATION: Source code in data preparation and model training are provided at GitHub website (https://github.com/YunxiaoRen/ML-iAMR). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Antibacterianos , Farmacorresistência Bacteriana , Animais , Humanos , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/genética , Ciprofloxacina , Aprendizado de Máquina , Genômica , Bactérias/genética
12.
J Med Internet Res ; 25: e42621, 2023 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-37436815

RESUMO

BACKGROUND: Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distributed machine learning models without sharing sensitive data. In addition, the implementation is time-consuming and requires advanced programming skills and complex technical infrastructures. OBJECTIVE: Various tools and frameworks have been developed to simplify the development of FL algorithms and provide the necessary technical infrastructure. Although there are many high-quality frameworks, most focus only on a single application case or method. To our knowledge, there are no generic frameworks, meaning that the existing solutions are restricted to a particular type of algorithm or application field. Furthermore, most of these frameworks provide an application programming interface that needs programming knowledge. There is no collection of ready-to-use FL algorithms that are extendable and allow users (eg, researchers) without programming knowledge to apply FL. A central FL platform for both FL algorithm developers and users does not exist. This study aimed to address this gap and make FL available to everyone by developing FeatureCloud, an all-in-one platform for FL in biomedicine and beyond. METHODS: The FeatureCloud platform consists of 3 main components: a global frontend, a global backend, and a local controller. Our platform uses a Docker to separate the local acting components of the platform from the sensitive data systems. We evaluated our platform using 4 different algorithms on 5 data sets for both accuracy and runtime. RESULTS: FeatureCloud removes the complexity of distributed systems for developers and end users by providing a comprehensive platform for executing multi-institutional FL analyses and implementing FL algorithms. Through its integrated artificial intelligence store, federated algorithms can easily be published and reused by the community. To secure sensitive raw data, FeatureCloud supports privacy-enhancing technologies to secure the shared local models and assures high standards in data privacy to comply with the strict General Data Protection Regulation. Our evaluation shows that applications developed in FeatureCloud can produce highly similar results compared with centralized approaches and scale well for an increasing number of participating sites. CONCLUSIONS: FeatureCloud provides a ready-to-use platform that integrates the development and execution of FL algorithms while reducing the complexity to a minimum and removing the hurdles of federated infrastructure. Thus, we believe that it has the potential to greatly increase the accessibility of privacy-preserving and distributed data analyses in biomedicine and beyond.


Assuntos
Algoritmos , Inteligência Artificial , Humanos , Ocupações em Saúde , Software , Redes de Comunicação de Computadores , Privacidade
13.
Nucleic Acids Res ; 46(D1): D360-D370, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29194489

RESUMO

MicroRNAs are important regulators of gene expression, achieved by binding to the gene to be regulated. Even with modern high-throughput technologies, it is laborious and expensive to detect all possible microRNA targets. For this reason, several computational microRNA-target prediction tools have been developed, each with its own strengths and limitations. Integration of different tools has been a successful approach to minimize the shortcomings of individual databases. Here, we present mirDIP v4.1, providing nearly 152 million human microRNA-target predictions, which were collected across 30 different resources. We also introduce an integrative score, which was statistically inferred from the obtained predictions, and was assigned to each unique microRNA-target interaction to provide a unified measure of confidence. We demonstrate that integrating predictions across multiple resources does not cumulate prediction bias toward biological processes or pathways. mirDIP v4.1 is freely available at http://ophid.utoronto.ca/mirDIP/.


Assuntos
Bases de Dados Genéticas , MicroRNAs/metabolismo , RNA Mensageiro/metabolismo , Humanos , RNA Mensageiro/química
14.
Artigo em Inglês | MEDLINE | ID: mdl-37126621

RESUMO

Despite their remarkable performance, deep neural networks remain unadopted in clinical practice, which is considered to be partially due to their lack of explainability. In this work, we apply explainable attribution methods to a pre-trained deep neural network for abnormality classification in 12-lead electrocardiography to open this "black box" and understand the relationship between model prediction and learned features. We classify data from two public databases (CPSC 2018, PTB-XL) and the attribution methods assign a "relevance score" to each sample of the classified signals. This allows analyzing what the network learned during training, for which we propose quantitative methods: average relevance scores over a) classes, b) leads, and c) average beats. The analyses of relevance scores for atrial fibrillation and left bundle branch block compared to healthy controls show that their mean values a) increase with higher classification probability and correspond to false classifications when around zero, and b) correspond to clinical recommendations regarding which lead to consider. Furthermore, c) visible P-waves and concordant T-waves result in clearly negative relevance scores in atrial fibrillation and left bundle branch block classification, respectively. Results are similar across both databases despite differences in study population and hardware. In summary, our analysis suggests that the DNN learned features similar to cardiology textbook knowledge.

15.
Comput Biol Med ; 143: 105263, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35131608

RESUMO

BACKGROUND: The main screening parameter to monitor prostate cancer recurrence (PCR) after primary treatment is the serum concentration of prostate-specific antigen (PSA). In recent years, Ga-68-PSMA PET/CT has become an important method for additional diagnostics in patients with biochemical recurrence. PURPOSE: While Ga-68-PSMA PET/CT performs better, it is an expensive, invasive, and time-consuming examination. Therefore, in this study, we aim to employ modern multivariate Machine Learning (ML) methods on electronic health records (EHR) of prostate cancer patients to improve the prediction of imaging confirmed PCR (IPCR). METHODS: We retrospectively analyzed the clinical information of 272 patients, who were examined using Ga-68-PSMA PET/CT. The PSA values ranged from 0 ng/mL to 2270.38 ng/mL with a median PSA level at 1.79 ng/mL. We performed a descriptive analysis using Logistic Regression. Additionally, we evaluated the predictive performance of Logistic Regression, Support Vector Machine, Gradient Boosting, and Random Forest. Finally, we assessed the importance of all features using Ensemble Feature Selection (EFS). RESULTS: The descriptive analysis found significant associations between IPCR and logarithmic PSA values as well as between IPCR and performed hormonal therapy. Our models were able to predict IPCR with an AUC score of 0.78 ± 0.13 (mean ± standard deviation) and a sensitivity of 0.997 ± 0.01. Features such as PSA, PSA doubling time, PSA velocity, hormonal therapy, radiation treatment, and injected activity show high importance for IPCR prediction using EFS. CONCLUSION: This study demonstrates the potential of employing a multitude of parameters into multivariate ML models to improve identification of non-recurring patients compared to the current focus on the main screening parameter (PSA). We showed that ML models are able to predict IPCR, detectable by Ga-68-PSMA PET/CT, and thereby pave the way for optimized early imaging and treatment.

16.
iScience ; 25(12): 105534, 2022 Dec 22.
Artigo em Inglês | MEDLINE | ID: mdl-36437879

RESUMO

The long-lasting trend of medical informatics is to adapt novel technologies in the medical context. In particular, incorporating artificial intelligence to support clinical decision-making can significantly improve monitoring, diagnostics, and prognostics for the patient's and medic's sake. However, obstacles hinder a timely technology transfer from research to the clinic. Due to the pressure for novelty in the research context, projects rarely implement quality standards. Here, we propose a guideline for academic software life cycle processes tailored to the needs and capabilities of research organizations. While the complete implementation of a software life cycle according to commercial standards is not feasible in scientific work, we propose a subset of elements that we are convinced will provide a significant benefit while keeping the effort within a feasible range. Ultimately, the emerging quality checks for academic software development can pave the way for an accelerated deployment of academic advances in clinical practice.

17.
Cancers (Basel) ; 13(13)2021 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-34202427

RESUMO

The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.

18.
NAR Genom Bioinform ; 3(4): lqab104, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34805988

RESUMO

Tremendous advances in next-generation sequencing technology have enabled the accumulation of large amounts of omics data in various research areas over the past decade. However, study limitations due to small sample sizes, especially in rare disease clinical research, technological heterogeneity and batch effects limit the applicability of traditional statistics and machine learning analysis. Here, we present a meta-transfer learning approach to transfer knowledge from big data and reduce the search space in data with small sample sizes. Few-shot learning algorithms integrate meta-learning to overcome data scarcity and data heterogeneity by transferring molecular pattern recognition models from datasets of unrelated domains. We explore few-shot learning models with large scale public dataset, TCGA (The Cancer Genome Atlas) and GTEx dataset, and demonstrate their potential as pre-training dataset in other molecular pattern recognition tasks. Our results show that meta-transfer learning is very effective for datasets with a limited sample size. Furthermore, we show that our approach can transfer knowledge across technological heterogeneity, for example, from bulk cell to single-cell data. Our approach can overcome study size constraints, batch effects and technical limitations in analyzing single-cell data by leveraging existing bulk-cell sequencing data.

19.
NAR Genom Bioinform ; 3(2): lqab039, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34046590

RESUMO

Owing to the great variety of distinct peptide encodings, working on a biomedical classification task at hand is challenging. Researchers have to determine encodings capable to represent underlying patterns as numerical input for the subsequent machine learning. A general guideline is lacking in the literature, thus, we present here the first large-scale comprehensive study to investigate the performance of a wide range of encodings on multiple datasets from different biomedical domains. For the sake of completeness, we added additional sequence- and structure-based encodings. In particular, we collected 50 biomedical datasets and defined a fixed parameter space for 48 encoding groups, leading to a total of 397 700 encoded datasets. Our results demonstrate that none of the encodings are superior for all biomedical domains. Nevertheless, some encodings often outperform others, thus reducing the initial encoding selection substantially. Our work offers researchers to objectively compare novel encodings to the state of the art. Our findings pave the way for a more sophisticated encoding optimization, for example, as part of automated machine learning pipelines. The work presented here is implemented as a large-scale, end-to-end workflow designed for easy reproducibility and extensibility. All standardized datasets and results are available for download to comply with FAIR standards.

20.
iScience ; 24(7): 102803, 2021 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-34296072

RESUMO

Computational methods can transform healthcare. In particular, health informatics with artificial intelligence has shown tremendous potential when applied in various fields of medical research and has opened a new era for precision medicine. The development of reusable biomedical software for research or clinical practice is time-consuming and requires rigorous compliance with quality requirements as defined by international standards. However, research projects rarely implement such measures, hindering smooth technology transfer into the research community or manufacturers as well as reproducibility and reusability. Here, we present a guideline for quality management systems (QMS) for academic organizations incorporating the essential components while confining the requirements to an easily manageable effort. It provides a starting point to implement a QMS tailored to specific needs effortlessly and greatly facilitates technology transfer in a controlled manner, thereby supporting reproducibility and reusability. Ultimately, the emerging standardized workflows can pave the way for an accelerated deployment in clinical practice.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA