Pesquisa | Portal de Pesquisa da BVS

1.

DeepLoc 2.1: multi-label membrane protein type prediction using protein language models.

Ødum, Marius Thrane; Teufel, Felix; Thumuluri, Vineet; Almagro Armenteros, José Juan; Johansen, Alexander Rosenberg; Winther, Ole; Nielsen, Henrik.

Nucleic Acids Res ; 52(W1): W215-W220, 2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-38587188

RESUMO

DeepLoc 2.0 is a popular web server for the prediction of protein subcellular localization and sorting signals. Here, we introduce DeepLoc 2.1, which additionally classifies the input proteins into the membrane protein types Transmembrane, Peripheral, Lipid-anchored and Soluble. Leveraging pre-trained transformer-based protein language models, the server utilizes a three-stage architecture for sequence-based, multi-label predictions. Comparative evaluations with other established tools on a test set of 4933 eukaryotic protein sequences, constructed following stringent homology partitioning, demonstrate state-of-the-art performance. Notably, DeepLoc 2.1 outperforms existing models, with the larger ProtT5 model exhibiting a marginal advantage over the ESM-1B model. The web server is available at https://services.healthtech.dtu.dk/services/DeepLoc-2.1.

Assuntos

Proteínas de Membrana , Software , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Internet , Sinais Direcionadores de Proteínas , Análise de Sequência de Proteína

2.

RNA trafficking and subcellular localization-a review of mechanisms, experimental and predictive methodologies.

Wang, Jun; Horlacher, Marc; Cheng, Lixin; Winther, Ole.

Brief Bioinform ; 24(5)2023 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-37466130

RESUMO

RNA localization is essential for regulating spatial translation, where RNAs are trafficked to their target locations via various biological mechanisms. In this review, we discuss RNA localization in the context of molecular mechanisms, experimental techniques and machine learning-based prediction tools. Three main types of molecular mechanisms that control the localization of RNA to distinct cellular compartments are reviewed, including directed transport, protection from mRNA degradation, as well as diffusion and local entrapment. Advances in experimental methods, both image and sequence based, provide substantial data resources, which allow for the design of powerful machine learning models to predict RNA localizations. We review the publicly available predictive tools to serve as a guide for users and inspire developers to build more effective prediction models. Finally, we provide an overview of multimodal learning, which may provide a new avenue for the prediction of RNA localization.

Assuntos

Transporte de RNA , RNA , RNA/genética , Transporte de RNA/fisiologia , Aprendizado de Máquina , Biologia Computacional/métodos

3.

DeepLocRNA: an interpretable deep learning model for predicting RNA subcellular localization with domain-specific transfer-learning.

Wang, Jun; Horlacher, Marc; Cheng, Lixin; Winther, Ole.

Bioinformatics ; 40(2)2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38317052

RESUMO

MOTIVATION: Accurate prediction of RNA subcellular localization plays an important role in understanding cellular processes and functions. Although post-transcriptional processes are governed by trans-acting RNA binding proteins (RBPs) through interaction with cis-regulatory RNA motifs, current methods do not incorporate RBP-binding information. RESULTS: In this article, we propose DeepLocRNA, an interpretable deep-learning model that leverages a pre-trained multi-task RBP-binding prediction model to predict the subcellular localization of RNA molecules via fine-tuning. We constructed DeepLocRNA using a comprehensive dataset with variant RNA types and evaluated it on the held-out dataset. Our model achieved state-of-the-art performance in predicting RNA subcellular localization in mRNA and miRNA. It has also demonstrated great generalization capabilities, performing well on both human and mouse RNA. Additionally, a motif analysis was performed to enhance the interpretability of the model, highlighting signal factors that contributed to the predictions. The proposed model provides general and powerful prediction abilities for different RNA types and species, offering valuable insights into the localization patterns of RNA molecules and contributing to our understanding of cellular processes at the molecular level. A user-friendly web server is available at: https://biolib.com/KU/DeepLocRNA/.

Assuntos

Aprendizado Profundo , Animais , Humanos , Camundongos , RNA/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Motivos de Nucleotídeos , Proteínas de Ligação a RNA/metabolismo , Biologia Computacional/métodos

4.

Deep integrative models for large-scale human genomics.

Sigurdsson, Arnór I; Louloudis, Ioannis; Banasik, Karina; Westergaard, David; Winther, Ole; Lund, Ole; Ostrowski, Sisse Rye; Erikstrup, Christian; Pedersen, Ole Birger Vesterager; Nyegaard, Mette; Brunak, Søren; Vilhjálmsson, Bjarni J; Rasmussen, Simon.

Nucleic Acids Res ; 51(12): e67, 2023 07 07.

Artigo em Inglês | MEDLINE | ID: mdl-37224538

RESUMO

Polygenic risk scores (PRSs) are expected to play a critical role in precision medicine. Currently, PRS predictors are generally based on linear models using summary statistics, and more recently individual-level data. However, these predictors mainly capture additive relationships and are limited in data modalities they can use. We developed a deep learning framework (EIR) for PRS prediction which includes a model, genome-local-net (GLN), specifically designed for large-scale genomics data. The framework supports multi-task learning, automatic integration of other clinical and biochemical data, and model explainability. When applied to individual-level data from the UK Biobank, the GLN model demonstrated a competitive performance compared to established neural network architectures, particularly for certain traits, showcasing its potential in modeling complex genetic relationships. Furthermore, the GLN model outperformed linear PRS methods for Type 1 Diabetes, likely due to modeling non-additive genetic effects and epistasis. This was supported by our identification of widespread non-additive genetic effects and epistasis in the context of T1D. Finally, we constructed PRS models that integrated genotype, blood, urine, and anthropometric data and found that this improved performance for 93% of the 290 diseases and disorders considered. EIR is available at https://github.com/arnor-sigurdsson/EIR.

Assuntos

Modelos Genéticos , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Humanos , Predisposição Genética para Doença , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica/métodos , Genótipo , Fatores de Risco

5.

DeepPeptide predicts cleaved peptides in proteins using conditional random fields.

Teufel, Felix; Refsgaard, Jan Christian; Madsen, Christian Toft; Stahlhut, Carsten; Grønborg, Mads; Winther, Ole; Madsen, Dennis.

Bioinformatics ; 39(10)2023 10 03.

Artigo em Inglês | MEDLINE | ID: mdl-37812217

RESUMO

MOTIVATION: Peptides are ubiquitous throughout life and involved in a wide range of biological processes, ranging from neural signaling in higher organisms to antimicrobial peptides in bacteria. Many peptides are generated post-translationally by cleavage of precursor proteins and can thus not be detected directly from genomics data, as the specificities of the responsible proteases are often not completely understood. RESULTS: We present DeepPeptide, a deep learning model that predicts cleaved peptides directly from the amino acid sequence. DeepPeptide shows both improved precision and recall for peptide detection compared to previous methodology. We show that the model is capable of identifying peptides in underannotated proteomes. AVAILABILITY AND IMPLEMENTATION: DeepPeptide is available online at ku.biolib.com/DeepPeptide.

Assuntos

Peptídeo Hidrolases , Peptídeos , Peptídeos/química , Sequência de Aminoácidos , Peptídeo Hidrolases/metabolismo , Proteoma/metabolismo

6.

DeepLoc 2.0: multi-label subcellular localization prediction using protein language models.

Thumuluri, Vineet; Almagro Armenteros, José Juan; Johansen, Alexander Rosenberg; Nielsen, Henrik; Winther, Ole.

Nucleic Acids Res ; 50(W1): W228-W234, 2022 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-35489069

RESUMO

The prediction of protein subcellular localization is of great relevance for proteomics research. Here, we propose an update to the popular tool DeepLoc with multi-localization prediction and improvements in both performance and interpretability. For training and validation, we curate eukaryotic and human multi-location protein datasets with stringent homology partitioning and enriched with sorting signal information compiled from the literature. We achieve state-of-the-art performance in DeepLoc 2.0 by using a pre-trained protein language model. It has the further advantage that it uses sequence input rather than relying on slower protein profiles. We provide two means of better interpretability: an attention output along the sequence and highly accurate prediction of nine different types of protein sorting signals. We find that the attention output correlates well with the position of sorting signals. The webserver is available at services.healthtech.dtu.dk/service.php?DeepLoc-2.0.

Assuntos

Sinais Direcionadores de Proteínas , Proteínas , Humanos , Proteínas/metabolismo , Eucariotos/metabolismo , Transporte Proteico , Idioma , Bases de Dados de Proteínas , Biologia Computacional , Frações Subcelulares/metabolismo

7.

NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning.

Høie, Magnus Haraldson; Kiehl, Erik Nicolas; Petersen, Bent; Nielsen, Morten; Winther, Ole; Nielsen, Henrik; Hallgren, Jeppe; Marcatili, Paolo.

Nucleic Acids Res ; 50(W1): W510-W515, 2022 07 05.

Artigo em Inglês | MEDLINE | ID: mdl-35648435

RESUMO

Recent advances in machine learning and natural language processing have made it possible to profoundly advance our ability to accurately predict protein structures and their functions. While such improvements are significantly impacting the fields of biology and biotechnology at large, such methods have the downside of high demands in terms of computing power and runtime, hampering their applicability to large datasets. Here, we present NetSurfP-3.0, a tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence. This NetSurfP update exploits recent advances in pre-trained protein language models to drastically improve the runtime of its predecessor by two orders of magnitude, while displaying similar prediction performance. We assessed the accuracy of NetSurfP-3.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features, with a runtime that is up to to 600 times faster than the most commonly available methods performing the same tasks. The tool is freely available as a web server with a user-friendly interface to navigate the results, as well as a standalone downloadable package.

Assuntos

Aprendizado Profundo , Processamento de Linguagem Natural , Estrutura Secundária de Proteína , Proteínas , Sequência de Aminoácidos , Proteínas/química , Proteínas/metabolismo , Conjuntos de Dados como Assunto , Solventes/química , Fatores de Tempo , Internet , Computadores , Software

8.

Deorphanizing Peptides Using Structure Prediction.

Teufel, Felix; Refsgaard, Jan C; Kasimova, Marina A; Deibler, Kristine; Madsen, Christian T; Stahlhut, Carsten; Grønborg, Mads; Winther, Ole; Madsen, Dennis.

J Chem Inf Model ; 63(9): 2651-2655, 2023 05 08.

Artigo em Inglês | MEDLINE | ID: mdl-37092865

RESUMO

Many endogenous peptides rely on signaling pathways to exert their function, but identifying their cognate receptors remains a challenging problem. We investigate the use of AlphaFold-Multimer complex structure prediction together with transmembrane topology prediction for peptide deorphanization. We find that AlphaFold's confidence metrics have strong performance for prioritizing true peptide-receptor interactions. In a library of 1112 human receptors, the method ranks true receptors in the top percentile on average for 11 benchmark peptide-receptor pairs.

Assuntos

Peptídeos , Transdução de Sinais , Humanos , Peptídeos/metabolismo

9.

Graph neural network interatomic potential ensembles with calibrated aleatoric and epistemic uncertainty on energy and forces.

Busk, Jonas; Schmidt, Mikkel N; Winther, Ole; Vegge, Tejs; Jørgensen, Peter Bjørn.

Phys Chem Chem Phys ; 25(37): 25828-25837, 2023 Sep 27.

Artigo em Inglês | MEDLINE | ID: mdl-37724552

RESUMO

Inexpensive machine learning (ML) potentials are increasingly being used to speed up structural optimization and molecular dynamics simulations of materials by iteratively predicting and applying interatomic forces. In these settings, it is crucial to detect when predictions are unreliable to avoid wrong or misleading results. Here, we present a complete framework for training and recalibrating graph neural network ensemble models to produce accurate predictions of energy and forces with calibrated uncertainty estimates. The proposed method considers both epistemic and aleatoric uncertainty and the total uncertainties are recalibrated post hoc using a nonlinear scaling function to achieve good calibration on previously unseen data, without loss of predictive accuracy. The method is demonstrated and evaluated on two challenging, publicly available datasets, ANI-1x (Smith et al. J. Chem. Phys., 2018, 148, 241733.) and Transition1x (Schreiner et al. Sci. Data, 2022, 9, 779.), both containing diverse conformations far from equilibrium. A detailed analysis of the predictive performance and uncertainty calibration is provided. In all experiments, the proposed method achieved low prediction error and good uncertainty calibration, with predicted uncertainty correlating with expected error, on energy and forces. To the best of our knowledge, the method presented in this paper is the first to consider a complete framework for obtaining calibrated epistemic and aleatoric uncertainty predictions on both energy and forces in ML potentials.

10.

Explainable Image Quality Assessments in Teledermatological Photography.

Jalaboi, Raluca; Winther, Ole; Galimzianova, Alfiia.

Telemed J E Health ; 29(9): 1342-1348, 2023 09.

Artigo em Inglês | MEDLINE | ID: mdl-36735575

RESUMO

Background and Objectives: Image quality is a crucial factor in the effectiveness and efficiency of teledermatological consultations. However, up to 50% of images sent by patients have quality issues, thus increasing the time to diagnosis and treatment. An automated, easily deployable, explainable method for assessing image quality is necessary to improve the current teledermatological consultation flow. We introduce ImageQX, a convolutional neural network for image quality assessment with a learning mechanism for identifying the most common poor image quality explanations: bad framing, bad lighting, blur, low resolution, and distance issues. Methods: ImageQX was trained on 26,635 photographs and validated on 9,874 photographs, each annotated with image quality labels and poor image quality explanations by up to 12 board-certified dermatologists. The photographic images were taken between 2017 and 2019 using a mobile skin disease tracking application accessible worldwide. Results: Our method achieves expert-level performance for both image quality assessment and poor image quality explanation. For image quality assessment, ImageQX obtains a macro F1-score of 0.73 ± 0.01, which places it within standard deviation of the pairwise inter-rater F1-score of 0.77 ± 0.07. For poor image quality explanations, our method obtains F1-scores of between 0.37 ± 0.01 and 0.70 ± 0.01, similar to the inter-rater pairwise F1-score of between 0.24 ± 0.15 and 0.83 ± 0.06. Moreover, with a size of only 15 MB, ImageQX is easily deployable on mobile devices. Conclusion: With an image quality detection performance similar to that of dermatologists, incorporating ImageQX into the teledermatology flow can enable a better, faster flow for remote consultations.

Assuntos

Aplicativos Móveis , Consulta Remota , Neoplasias Cutâneas , Humanos , Neoplasias Cutâneas/diagnóstico , Redes Neurais de Computação , Fotografação

11.

scVAE: variational auto-encoders for single-cell gene expression data.

Grønbech, Christopher Heje; Vording, Maximillian Fornitz; Timshel, Pascal N; Sønderby, Casper Kaae; Pers, Tune H; Winther, Ole.

Bioinformatics ; 36(16): 4415-4422, 2020 08 15.

Artigo em Inglês | MEDLINE | ID: mdl-32415966

RESUMO

MOTIVATION: Models for analysing and making relevant biological inferences from massive amounts of complex single-cell transcriptomic data typically require several individual data-processing steps, each with their own set of hyperparameter choices. With deep generative models one can work directly with count data, make likelihood-based model comparison, learn a latent representation of the cells and capture more of the variability in different cell populations. RESULTS: We propose a novel method based on variational auto-encoders (VAEs) for analysis of single-cell RNA sequencing (scRNA-seq) data. It avoids data preprocessing by using raw count data as input and can robustly estimate the expected gene expression levels and a latent representation for each cell. We tested several count likelihood functions and a variant of the VAE that has a priori clustering in the latent space. We show for several scRNA-seq datasets that our method outperforms recently proposed scRNA-seq methods in clustering cells and that the resulting clusters reflect cell types. AVAILABILITY AND IMPLEMENTATION: Our method, called scVAE, is implemented in Python using the TensorFlow machine-learning library, and it is freely available at https://github.com/scvae/scvae. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Perfilação da Expressão Gênica , Análise de Célula Única , Funções Verossimilhança , Análise de Sequência de RNA , Software

12.

Reducing the rate of psychiatric re-admissions in bipolar disorder using smartphones-The RADMIS trial.

Faurholt-Jepsen, Maria; Lindbjerg Tønning, Morten; Fros, Mads; Martiny, Klaus; Tuxen, Nanna; Rosenberg, Nicole; Busk, Jonas; Winther, Ole; Thaysen-Petersen, Daniel; Aamund, Kate Andreasson; Tolderlund, Lizzie; Bardram, Jakob Eyvind; Kessing, Lars Vedel.

Acta Psychiatr Scand ; 143(5): 453-465, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33354769

RESUMO

OBJECTIVES: The MONARCA I and II trials were negative but suggested that smartphone-based monitoring may increase quality of life and reduce perceived stress in bipolar disorder (BD). The present trial was the first to investigate the effect of smartphone-based monitoring on the rate and duration of readmissions in BD. METHODS: This was a randomized controlled single-blind parallel-group trial. Patients with BD (ICD-10) discharged from hospitalization in the Mental Health Services, Capital Region of Denmark were randomized 1:1 to daily smartphone-based monitoring including a feedback loop (+ standard treatment) or to standard treatment for 6 months. Primary outcomes: the rate and duration of psychiatric readmissions. RESULTS: We included 98 patients with BD. In ITT analyses, there was no statistically significant difference in rates (hazard rate: 1.05, 95% CI: 0.54; 1.91, p = 0.88) or duration of readmission between the two groups (B: 3.67, 95% CI: -4.77; 12.11, p = 0.39). There was no difference in scores on the Hamilton Depression Rating Scale (B = -0.11, 95% CI: -2.50; 2.29, p = 0.93). The intervention group had higher scores on the Young Mania Rating Scale (B: 1.89, 95% CI: 0.0078; 3.78, p = 0.050). The intervention group reported lower levels of perceived stress (B: -7.18, 95% CI: -13.50; -0.86, p = 0.026) and lower levels of rumination (B: -6.09, 95% CI: -11.19; -1.00, p = 0.019). CONCLUSIONS: Smartphone-based monitoring did not reduce rate and duration of readmissions. There was no difference in levels of depressive symptoms. The intervention group had higher levels of manic symptoms, but lower perceived stress and rumination compared with the control group.

Assuntos

Transtorno Bipolar , Transtorno Bipolar/terapia , Hospitalização , Humanos , Qualidade de Vida , Método Simples-Cego , Smartphone

13.

Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data.

Kinalis, Savvas; Nielsen, Finn Cilius; Winther, Ole; Bagger, Frederik Otzen.

BMC Bioinformatics ; 20(1): 379, 2019 Jul 08.

Artigo em Inglês | MEDLINE | ID: mdl-31286861

RESUMO

BACKGROUND: Unsupervised machine learning methods (deep learning) have shown their usefulness with noisy single cell mRNA-sequencing data (scRNA-seq), where the models generalize well, despite the zero-inflation of the data. A class of neural networks, namely autoencoders, has been useful for denoising of single cell data, imputation of missing values and dimensionality reduction. RESULTS: Here, we present a striking feature with the potential to greatly increase the usability of autoencoders: With specialized training, the autoencoder is not only able to generalize over the data, but also to tease apart biologically meaningful modules, which we found encoded in the representation layer of the network. Our model can, from scRNA-seq data, delineate biological meaningful modules that govern a dataset, as well as give information as to which modules are active in each single cell. Importantly, most of these modules can be explained by known biological functions, as provided by the Hallmark gene sets. CONCLUSIONS: We discover that tailored training of an autoencoder makes it possible to deconvolute biological modules inherent in the data, without any assumptions. By comparisons with gene signatures of canonical pathways we see that the modules are directly interpretable. The scope of this discovery has important implications, as it makes it possible to outline the drivers behind a given effect of a cell. In comparison with other dimensionality reduction methods, or supervised models for classification, our approach has the benefit of both handling well the zero-inflated nature of scRNA-seq, and validating that the model captures relevant information, by establishing a link between input and decoded data. In perspective, our model in combination with clustering methods is able to provide information about which subtype a given single cell belongs to, as well as which biological functions determine that membership.

Assuntos

Perfilação da Expressão Gênica/métodos , Redes Neurais de Computação , RNA Mensageiro/química , Análise de Sequência de RNA/métodos , Aprendizado de Máquina não Supervisionado , Análise por Conglomerados , RNA Mensageiro/metabolismo , Análise de Célula Única

14.

NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning.

Klausen, Michael Schantz; Jespersen, Martin Closter; Nielsen, Henrik; Jensen, Kamilla Kjaergaard; Jurtz, Vanessa Isabell; Sønderby, Casper Kaae; Sommer, Morten Otto Alexander; Winther, Ole; Nielsen, Morten; Petersen, Bent; Marcatili, Paolo.

Proteins ; 87(6): 520-527, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-30785653

RESUMO

The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unraveling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1000 proteins in less than 2 hours, and complete proteomes in less than 1 day.

Assuntos

Bases de Dados de Proteínas , Aprendizado Profundo , Biologia Computacional , Estrutura Secundária de Proteína , Proteoma/química

15.

DeepLoc: prediction of protein subcellular localization using deep learning.

Almagro Armenteros, José Juan; Sønderby, Casper Kaae; Sønderby, Søren Kaae; Nielsen, Henrik; Winther, Ole.

Bioinformatics ; 33(21): 3387-3395, 2017 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-29036616

RESUMO

MOTIVATION: The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. RESULTS: Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. AVAILABILITY AND IMPLEMENTATION: The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. CONTACT: jjalma@dtu.dk.

Assuntos

Biologia Computacional/métodos , Aprendizado de Máquina , Transporte Proteico , Análise de Sequência de Proteína/métodos , Software , Eucariotos/metabolismo , Células Eucarióticas/metabolismo , Modelos Biológicos , Anotação de Sequência Molecular/métodos , Redes Neurais de Computação

16.

An introduction to deep learning on biological sequence data: examples and solutions.

Jurtz, Vanessa Isabell; Johansen, Alexander Rosenberg; Nielsen, Morten; Almagro Armenteros, Jose Juan; Nielsen, Henrik; Sønderby, Casper Kaae; Winther, Ole; Sønderby, Søren Kaae.

Bioinformatics ; 33(22): 3685-3690, 2017 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-28961695

RESUMO

MOTIVATION: Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. RESULTS: Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. AVAILABILITY AND IMPLEMENTATION: All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. CONTACT: skaaesonderby@gmail.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Aprendizado de Máquina , Estrutura Secundária de Proteína , Transporte Proteico , Análise de Sequência de Proteína/métodos , Biologia Computacional/métodos , Redes Neurais de Computação , Peptídeos/metabolismo , Ligação Proteica

17.

BloodSpot: a database of gene expression profiles and transcriptional programs for healthy and malignant haematopoiesis.

Bagger, Frederik Otzen; Sasivarevic, Damir; Sohi, Sina Hadi; Laursen, Linea Gøricke; Pundhir, Sachin; Sønderby, Casper Kaae; Winther, Ole; Rapin, Nicolas; Porse, Bo T.

Nucleic Acids Res ; 44(D1): D917-24, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26507857

RESUMO

Research on human and murine haematopoiesis has resulted in a vast number of gene-expression data sets that can potentially answer questions regarding normal and aberrant blood formation. To researchers and clinicians with limited bioinformatics experience, these data have remained available, yet largely inaccessible. Current databases provide information about gene-expression but fail to answer key questions regarding co-regulation, genetic programs or effect on patient survival. To address these shortcomings, we present BloodSpot (www.bloodspot.eu), which includes and greatly extends our previously released database HemaExplorer, a database of gene expression profiles from FACS sorted healthy and malignant haematopoietic cells. A revised interactive interface simultaneously provides a plot of gene expression along with a Kaplan-Meier analysis and a hierarchical tree depicting the relationship between different cell types in the database. The database now includes 23 high-quality curated data sets relevant to normal and malignant blood formation and, in addition, we have assembled and built a unique integrated data set, BloodPool. Bloodpool contains more than 2000 samples assembled from six independent studies on acute myeloid leukemia. Furthermore, we have devised a robust sample integration procedure that allows for sensitive comparison of user-supplied patient samples in a well-defined haematopoietic cellular space.

Assuntos

Bases de Dados Genéticas , Perfilação da Expressão Gênica , Hematopoese/genética , Leucemia Mieloide Aguda/genética , Transcrição Gênica , Animais , Células-Tronco Hematopoéticas/metabolismo , Humanos , Leucemia Mieloide Aguda/metabolismo , Leucemia Mieloide Aguda/mortalidade , Camundongos

18.

Gaussian process based independent analysis for temporal source separation in fMRI.

Hald, Ditte Høvenhoff; Henao, Ricardo; Winther, Ole.

Neuroimage ; 152: 563-574, 2017 05 15.

Artigo em Inglês | MEDLINE | ID: mdl-28249758

RESUMO

Functional Magnetic Resonance Imaging (fMRI) gives us a unique insight into the processes of the brain, and opens up for analyzing the functional activation patterns of the underlying sources. Task-inferred supervised learning with restrictive assumptions in the regression set-up, restricts the exploratory nature of the analysis. Fully unsupervised independent component analysis (ICA) algorithms, on the other hand, can struggle to detect clear classifiable components on single-subject data. We attribute this shortcoming to inadequate modeling of the fMRI source signals by failing to incorporate its temporal nature. fMRI source signals, biological stimuli and non-stimuli-related artifacts are all smooth over a time-scale compatible with the sampling time (TR). We therefore propose Gaussian process ICA (GPICA), which facilitates temporal dependency by the use of Gaussian process source priors. On two fMRI data sets with different sampling frequency, we show that the GPICA-inferred temporal components and associated spatial maps allow for a more definite interpretation than standard temporal ICA methods. The temporal structures of the sources are controlled by the covariance of the Gaussian process, specified by a kernel function with an interpretable and controllable temporal length scale parameter. We propose a hierarchical model specification, considering both instantaneous and convolutive mixing, and we infer source spatial maps, temporal patterns and temporal length scale parameters by Markov Chain Monte Carlo. A companion implementation made as a plug-in for SPM can be downloaded from https://github.com/dittehald/GPICA.

Assuntos

Mapeamento Encefálico/métodos , Encéfalo/fisiologia , Imageamento por Ressonância Magnética , Algoritmos , Artefatos , Humanos , Aumento da Imagem , Método de Monte Carlo , Distribuição Normal , Processamento de Sinais Assistido por Computador

19.

RSK is a principal effector of the RAS-ERK pathway for eliciting a coordinate promotile/invasive gene program and phenotype in epithelial cells.

Doehn, Ulrik; Hauge, Camilla; Frank, Scott R; Jensen, Claus J; Duda, Katarzyna; Nielsen, Jakob V; Cohen, Michael S; Johansen, Jens V; Winther, Benny R; Lund, Leif R; Winther, Ole; Taunton, Jack; Hansen, Steen H; Frödin, Morten.

Mol Cell ; 35(4): 511-22, 2009 Aug 28.

Artigo em Inglês | MEDLINE | ID: mdl-19716794

RESUMO

The RAS-stimulated RAF-MEK-ERK pathway confers epithelial cells with critical motile and invasive capacities during development, tissue regeneration, and carcinoma progression, often via promoting the epithelial-mesenchymal transition (EMT). Many mechanisms by which ERK exerts this control remain elusive. We demonstrate that the ERK-activated kinase RSK is necessary to induce mesenchymal motility and invasive capacities in nontransformed epithelial and carcinoma cells. RSK is sufficient to induce certain motile responses. Expression profiling analysis revealed that a primary role of RSK is to induce transcription of a potent promotile/invasive gene program by FRA1-dependent and -independent mechanisms. The program enables RSK to coordinately modulate the extracellular environment, the intracellular motility apparatus, and receptors mediating communication between these compartments to stimulate motility and invasion. These findings uncover a mechanism whereby the RAS-ERK pathway controls epithelial cell motility by identifying RSK as a key effector, from which emanate multiple highly coordinate transcription-dependent mechanisms for stimulation of motility and invasive properties.

Assuntos

Carcinoma/enzimologia , Movimento Celular , Transdiferenciação Celular , Transformação Celular Neoplásica/metabolismo , Células Epiteliais/enzimologia , MAP Quinases Reguladas por Sinal Extracelular/metabolismo , Proteínas Quinases S6 Ribossômicas 90-kDa/metabolismo , Proteínas ras/metabolismo , Animais , Carcinoma/genética , Carcinoma/patologia , Linhagem Celular , Movimento Celular/genética , Transdiferenciação Celular/genética , Transformação Celular Neoplásica/genética , Transformação Celular Neoplásica/patologia , Cães , Células Epiteliais/patologia , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genótipo , Humanos , Mesoderma/enzimologia , Mesoderma/patologia , Invasividade Neoplásica , Fenótipo , Proteínas Proto-Oncogênicas c-fos/genética , Proteínas Proto-Oncogênicas c-fos/metabolismo , Transdução de Sinais , Fatores de Tempo , Transcrição Gênica , Transdução Genética

20.

Comparing cancer vs normal gene expression profiles identifies new disease entities and common transcriptional programs in AML patients.

Rapin, Nicolas; Bagger, Frederik Otzen; Jendholm, Johan; Mora-Jensen, Helena; Krogh, Anders; Kohlmann, Alexander; Thiede, Christian; Borregaard, Niels; Bullinger, Lars; Winther, Ole; Theilgaard-Mönch, Kim; Porse, Bo T.

Blood ; 123(6): 894-904, 2014 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-24363398

RESUMO

Gene expression profiling has been used extensively to characterize cancer, identify novel subtypes, and improve patient stratification. However, it has largely failed to identify transcriptional programs that differ between cancer and corresponding normal cells and has not been efficient in identifying expression changes fundamental to disease etiology. Here we present a method that facilitates the comparison of any cancer sample to its nearest normal cellular counterpart, using acute myeloid leukemia (AML) as a model. We first generated a gene expression-based landscape of the normal hematopoietic hierarchy, using expression profiles from normal stem/progenitor cells, and next mapped the AML patient samples to this landscape. This allowed us to identify the closest normal counterpart of individual AML samples and determine gene expression changes between cancer and normal. We find the cancer vs normal method (CvN method) to be superior to conventional methods in stratifying AML patients with aberrant karyotype and in identifying common aberrant transcriptional programs with potential importance for AML etiology. Moreover, the CvN method uncovered a novel poor-outcome subtype of normal-karyotype AML, which allowed for the generation of a highly prognostic survival signature. Collectively, our CvN method holds great potential as a tool for the analysis of gene expression profiles of cancer patients.

Assuntos

Biomarcadores Tumorais/genética , Células-Tronco Hematopoéticas/metabolismo , Leucemia Mieloide Aguda/genética , Western Blotting , Estudos de Casos e Controles , Seguimentos , Perfilação da Expressão Gênica , Humanos , Leucemia Mieloide Aguda/patologia , Análise de Sequência com Séries de Oligonucleotídeos , Prognóstico , RNA Mensageiro/genética , Reação em Cadeia da Polimerase em Tempo Real , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Taxa de Sobrevida

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA