Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Bioinformatics ; 36(16): 4415-4422, 2020 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-32415966

RESUMEN

MOTIVATION: Models for analysing and making relevant biological inferences from massive amounts of complex single-cell transcriptomic data typically require several individual data-processing steps, each with their own set of hyperparameter choices. With deep generative models one can work directly with count data, make likelihood-based model comparison, learn a latent representation of the cells and capture more of the variability in different cell populations. RESULTS: We propose a novel method based on variational auto-encoders (VAEs) for analysis of single-cell RNA sequencing (scRNA-seq) data. It avoids data preprocessing by using raw count data as input and can robustly estimate the expected gene expression levels and a latent representation for each cell. We tested several count likelihood functions and a variant of the VAE that has a priori clustering in the latent space. We show for several scRNA-seq datasets that our method outperforms recently proposed scRNA-seq methods in clustering cells and that the resulting clusters reflect cell types. AVAILABILITY AND IMPLEMENTATION: Our method, called scVAE, is implemented in Python using the TensorFlow machine-learning library, and it is freely available at https://github.com/scvae/scvae. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Funciones de Verosimilitud , Análisis de Secuencia de ARN , Programas Informáticos
2.
Proteins ; 87(6): 520-527, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30785653

RESUMEN

The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unraveling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1000 proteins in less than 2 hours, and complete proteomes in less than 1 day.


Asunto(s)
Bases de Datos de Proteínas , Aprendizaje Profundo , Biología Computacional , Estructura Secundaria de Proteína , Proteoma/química
3.
Bioinformatics ; 33(21): 3387-3395, 2017 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-29036616

RESUMEN

MOTIVATION: The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. RESULTS: Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. AVAILABILITY AND IMPLEMENTATION: The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. CONTACT: jjalma@dtu.dk.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Transporte de Proteínas , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Eucariontes/metabolismo , Células Eucariotas/metabolismo , Modelos Biológicos , Anotación de Secuencia Molecular/métodos , Redes Neurales de la Computación
4.
Bioinformatics ; 33(22): 3685-3690, 2017 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-28961695

RESUMEN

MOTIVATION: Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. RESULTS: Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. AVAILABILITY AND IMPLEMENTATION: All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. CONTACT: skaaesonderby@gmail.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Estructura Secundaria de Proteína , Transporte de Proteínas , Análisis de Secuencia de Proteína/métodos , Biología Computacional/métodos , Redes Neurales de la Computación , Péptidos/metabolismo , Unión Proteica
5.
Nucleic Acids Res ; 44(D1): D917-24, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26507857

RESUMEN

Research on human and murine haematopoiesis has resulted in a vast number of gene-expression data sets that can potentially answer questions regarding normal and aberrant blood formation. To researchers and clinicians with limited bioinformatics experience, these data have remained available, yet largely inaccessible. Current databases provide information about gene-expression but fail to answer key questions regarding co-regulation, genetic programs or effect on patient survival. To address these shortcomings, we present BloodSpot (www.bloodspot.eu), which includes and greatly extends our previously released database HemaExplorer, a database of gene expression profiles from FACS sorted healthy and malignant haematopoietic cells. A revised interactive interface simultaneously provides a plot of gene expression along with a Kaplan-Meier analysis and a hierarchical tree depicting the relationship between different cell types in the database. The database now includes 23 high-quality curated data sets relevant to normal and malignant blood formation and, in addition, we have assembled and built a unique integrated data set, BloodPool. Bloodpool contains more than 2000 samples assembled from six independent studies on acute myeloid leukemia. Furthermore, we have devised a robust sample integration procedure that allows for sensitive comparison of user-supplied patient samples in a well-defined haematopoietic cellular space.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Hematopoyesis/genética , Leucemia Mieloide Aguda/genética , Transcripción Genética , Animales , Células Madre Hematopoyéticas/metabolismo , Humanos , Leucemia Mieloide Aguda/metabolismo , Leucemia Mieloide Aguda/mortalidad , Ratones
6.
Magn Reson Med ; 73(3): 1171-6, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24639209

RESUMEN

PURPOSE: The short diffusion time regime provides an interesting probe for tissue microstructure and can be investigated with oscillating gradient spin echo (OGSE) experiments. Several studies report new contrasts in preclinical settings and the first in vivo human experiments have recently been presented. One major hurdle in practical implementation is the low effective diffusion weighting provided at high frequency with limited gradient strength. THEORY: As a solution to the low diffusion weighting of OGSE, circularly polarized OGSE (CP-OGSE) is introduced. CP-OGSE gives a twofold increase in diffusion weighting with encoding in a plane rather than in one direction. CP-OGSE can be used for rotationally invariant acquisitions on anisotropic tissues. METHODS: Experiments with a 4.7 T preclinical scanner on a postmortem monkey brain as well as simulations were performed using conventional OGSE and CP-OGSE. RESULTS: Simulations and experiments show that CP-OGSE provides the same microstructural information as OGSE but provides more robust parameter estimates with limited gradient strength. CONCLUSIONS: CP-OGSE can be an important contribution for making OGSE imaging more effective in clinical imaging settings with limited gradient strength. Furthermore, the improved diffusion weighting can also be used to expand the investigated frequency range.


Asunto(s)
Algoritmos , Cerebelo/anatomía & histología , Imagen de Difusión por Resonancia Magnética/métodos , Aumento de la Imagen/métodos , Interpretación de Imagen Asistida por Computador/métodos , Oscilometría/métodos , Animales , Chlorocebus aethiops , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Procesamiento de Señales Asistido por Computador , Programas Informáticos
8.
NMR Biomed ; 26(12): 1647-62, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24038641

RESUMEN

Pulsed field gradient diffusion sequences (PFG) with multiple diffusion encoding blocks have been indicated to offer new microstructural tissue information, such as the ability to detect nonspherical compartment shapes in macroscopically isotropic samples, i.e. samples with negligible directional signal dependence on diffusion gradients in standard diffusion experiments. However, current acquisition schemes are not rotationally invariant in the sense that the derived metrics depend on the orientation of the sample, and are affected by the interplay of sampling directions and compartment orientation dispersion when applied to macroscopically anisotropic systems. Here we propose a new framework, the d-PFG 5-design, to enable rotationally invariant estimation of double wave vector diffusion metrics (d-PFG). The method is based on the idea that an appropriate orientational average of the signal emulates the signal from a powder preparation of the same sample, where macroscopic anisotropy is absent by construction. Our approach exploits the theory of exact numerical integration (quadrature) of polynomials on the rotation group, and we exemplify the general procedure with a set consisting of 60 pairs of diffusion wave vectors (the d-PFG 5-design) facilitating a theoretically exact determination of the fourth order Taylor or cumulant expansion of the orientationally averaged signal. The d-PFG 5-design is evaluated with numerical simulations and ex vivo high field diffusion MRI experiments in a nonhuman primate brain. Specifically, we demonstrate rotational invariance when estimating compartment eccentricity, which we show offers new microstructural information, complementary to that of fractional anisotropy (FA) from diffusion tensor imaging (DTI). The imaging observations are supported by a new theoretical result, directly relating compartment eccentricity to FA of individual pores.


Asunto(s)
Imagen de Difusión por Resonancia Magnética/métodos , Animales , Anisotropía , Chlorocebus aethiops , Simulación por Computador , Difusión , Imagenología Tridimensional , Rotación
9.
Nat Biotechnol ; 39(5): 555-560, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33398153

RESUMEN

Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data remains challenging. Here we develop variational autoencoders for metagenomic binning (VAMB), a program that uses deep variational autoencoders to encode sequence coabundance and k-mer distribution information before clustering. We show that a variational autoencoder is able to integrate these two distinct data types without any previous knowledge of the datasets. VAMB outperforms existing state-of-the-art binners, reconstructing 29-98% and 45% more near-complete (NC) genomes on simulated and real data, respectively. Furthermore, VAMB is able to separate closely related strains up to 99.5% average nucleotide identity (ANI), and reconstructed 255 and 91 NC Bacteroides vulgatus and Bacteroides dorei sample-specific genomes as two distinct clusters from a dataset of 1,000 human gut microbiome samples. We use 2,606 NC bins from this dataset to show that species of the human gut microbiome have different geographical distribution patterns. VAMB can be run on standard hardware and is freely available at https://github.com/RasmussenLab/vamb .


Asunto(s)
Genoma Bacteriano/genética , Metagenoma/genética , Anotación de Secuencia Molecular , Programas Informáticos , Bacteroides/genética , Humanos , Metagenómica , Microbiota/genética
10.
Nat Biotechnol ; 37(4): 420-423, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30778233

RESUMEN

Signal peptides (SPs) are short amino acid sequences in the amino terminus of many newly synthesized proteins that target proteins into, or across, membranes. Bioinformatic tools can predict SPs from amino acid sequences, but most cannot distinguish between various types of signal peptides. We present a deep neural network-based approach that improves SP prediction across all domains of life and distinguishes between three types of prokaryotic SPs.


Asunto(s)
Redes Neurales de la Computación , Señales de Clasificación de Proteína/genética , Señales de Clasificación de Proteína/fisiología , Algoritmos , Secuencia de Aminoácidos , Proteínas Arqueales/clasificación , Proteínas Arqueales/genética , Proteínas Arqueales/metabolismo , Proteínas Bacterianas/clasificación , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Biotecnología , Biología Computacional , Eucariontes/genética , Eucariontes/metabolismo , Análisis de Secuencia de Proteína , Programas Informáticos
11.
Cell Res ; 25(11): 1205-18, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26470845

RESUMEN

ASXL1 mutations are frequently found in hematological tumors, and loss of Asxl1 promotes myeloid transformation in mice. Here we present data supporting a role for an ASXL1-BAP1 complex in the deubiquitylation of mono-ubiquitylated lysine 119 on Histone H2A (H2AK119ub1) in vivo. The Polycomb group proteins control the expression of the INK4B-ARF-INK4A locus during normal development, in part through catalyzing mono-ubiquitylation of H2AK119. Since the activation of the locus INK4B-ARF-INK4A plays a fail-safe mechanism protecting against tumorigenesis, we investigated whether ASXL1-dependent H2A deubiquitylation plays a role in its activation. Interestingly, we found that ASXL1 is specifically required for the increased expression of p15(INK4B) in response to both oncogenic signaling and extrinsic anti-proliferative signals. Since we found that ASXL1 and BAP1 both are enriched at the INK4B locus, our results suggest that activation of the INK4B locus requires ASXL1/BAP1-mediated deubiquitylation of H2AK119ub1. Consistently, our results show that ASXL1 mutations are associated with lower expression levels of p15(INK4B) and a proliferative advantage of hematopoietic progenitors in primary bone marrow cells, and that depletion of ASXL1 in multiple cell lines results in resistance to growth inhibitory signals. Taken together, this study links ASXL1-mediated H2A deubiquitylation and transcriptional activation of INK4B expression to its tumor suppressor functions.


Asunto(s)
Inhibidor p15 de las Quinasas Dependientes de la Ciclina/metabolismo , Histonas/metabolismo , Proteínas Represoras/metabolismo , Animales , Línea Celular , Proliferación Celular , Humanos , Ratones , Mutación , Proteínas del Grupo Polycomb/metabolismo , Regiones Promotoras Genéticas , Proteínas Supresoras de Tumor/metabolismo , Ubiquitina Tiolesterasa/metabolismo , Proteasas Ubiquitina-Específicas/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA