Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
Add more filters










Publication year range
1.
Nat Commun ; 13(1): 5145, 2022 09 01.
Article in English | MEDLINE | ID: mdl-36050311

ABSTRACT

Existing weather forecasting models are based on physics and use supercomputers to evolve the atmosphere into the future. Better physics-based forecasts require improved atmospheric models, which can be difficult to discover and develop, or increasing the resolution underlying the simulation, which can be computationally prohibitive. An emerging class of weather models based on neural networks overcome these limitations by learning the required transformations from data instead of relying on hand-coded physics and by running efficiently in parallel. Here we present a neural network capable of predicting precipitation at a high resolution up to 12 h ahead. The model predicts raw precipitation targets and outperforms for up to 12 h of lead time state-of-the-art physics-based models currently operating in the Continental United States. The results represent a substantial step towards validating the new class of neural weather models.


Subject(s)
Deep Learning , Computer Simulation , Forecasting , Neural Networks, Computer , Weather
2.
Nat Biotechnol ; 39(5): 555-560, 2021 05.
Article in English | MEDLINE | ID: mdl-33398153

ABSTRACT

Despite recent advances in metagenomic binning, reconstruction of microbial species from metagenomics data remains challenging. Here we develop variational autoencoders for metagenomic binning (VAMB), a program that uses deep variational autoencoders to encode sequence coabundance and k-mer distribution information before clustering. We show that a variational autoencoder is able to integrate these two distinct data types without any previous knowledge of the datasets. VAMB outperforms existing state-of-the-art binners, reconstructing 29-98% and 45% more near-complete (NC) genomes on simulated and real data, respectively. Furthermore, VAMB is able to separate closely related strains up to 99.5% average nucleotide identity (ANI), and reconstructed 255 and 91 NC Bacteroides vulgatus and Bacteroides dorei sample-specific genomes as two distinct clusters from a dataset of 1,000 human gut microbiome samples. We use 2,606 NC bins from this dataset to show that species of the human gut microbiome have different geographical distribution patterns. VAMB can be run on standard hardware and is freely available at https://github.com/RasmussenLab/vamb .


Subject(s)
Genome, Bacterial/genetics , Metagenome/genetics , Molecular Sequence Annotation , Software , Bacteroides/genetics , Humans , Metagenomics , Microbiota/genetics
3.
Bioinformatics ; 36(16): 4415-4422, 2020 08 15.
Article in English | MEDLINE | ID: mdl-32415966

ABSTRACT

MOTIVATION: Models for analysing and making relevant biological inferences from massive amounts of complex single-cell transcriptomic data typically require several individual data-processing steps, each with their own set of hyperparameter choices. With deep generative models one can work directly with count data, make likelihood-based model comparison, learn a latent representation of the cells and capture more of the variability in different cell populations. RESULTS: We propose a novel method based on variational auto-encoders (VAEs) for analysis of single-cell RNA sequencing (scRNA-seq) data. It avoids data preprocessing by using raw count data as input and can robustly estimate the expected gene expression levels and a latent representation for each cell. We tested several count likelihood functions and a variant of the VAE that has a priori clustering in the latent space. We show for several scRNA-seq datasets that our method outperforms recently proposed scRNA-seq methods in clustering cells and that the resulting clusters reflect cell types. AVAILABILITY AND IMPLEMENTATION: Our method, called scVAE, is implemented in Python using the TensorFlow machine-learning library, and it is freely available at https://github.com/scvae/scvae. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Likelihood Functions , Sequence Analysis, RNA , Software
4.
Proteins ; 87(6): 520-527, 2019 06.
Article in English | MEDLINE | ID: mdl-30785653

ABSTRACT

The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unraveling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1000 proteins in less than 2 hours, and complete proteomes in less than 1 day.


Subject(s)
Databases, Protein , Deep Learning , Computational Biology , Protein Structure, Secondary , Proteome/chemistry
5.
Nat Biotechnol ; 37(4): 420-423, 2019 04.
Article in English | MEDLINE | ID: mdl-30778233

ABSTRACT

Signal peptides (SPs) are short amino acid sequences in the amino terminus of many newly synthesized proteins that target proteins into, or across, membranes. Bioinformatic tools can predict SPs from amino acid sequences, but most cannot distinguish between various types of signal peptides. We present a deep neural network-based approach that improves SP prediction across all domains of life and distinguishes between three types of prokaryotic SPs.


Subject(s)
Neural Networks, Computer , Protein Sorting Signals/genetics , Protein Sorting Signals/physiology , Algorithms , Amino Acid Sequence , Archaeal Proteins/classification , Archaeal Proteins/genetics , Archaeal Proteins/metabolism , Bacterial Proteins/classification , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Biotechnology , Computational Biology , Eukaryota/genetics , Eukaryota/metabolism , Sequence Analysis, Protein , Software
7.
Bioinformatics ; 33(21): 3387-3395, 2017 Nov 01.
Article in English | MEDLINE | ID: mdl-29036616

ABSTRACT

MOTIVATION: The prediction of eukaryotic protein subcellular localization is a well-studied topic in bioinformatics due to its relevance in proteomics research. Many machine learning methods have been successfully applied in this task, but in most of them, predictions rely on annotation of homologues from knowledge databases. For novel proteins where no annotated homologues exist, and for predicting the effects of sequence variants, it is desirable to have methods for predicting protein properties from sequence information only. RESULTS: Here, we present a prediction algorithm using deep neural networks to predict protein subcellular localization relying only on sequence information. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism identifying protein regions important for the subcellular localization. The model was trained and tested on a protein dataset extracted from one of the latest UniProt releases, in which experimentally annotated proteins follow more stringent criteria than previously. We demonstrate that our model achieves a good accuracy (78% for 10 categories; 92% for membrane-bound or soluble), outperforming current state-of-the-art algorithms, including those relying on homology information. AVAILABILITY AND IMPLEMENTATION: The method is available as a web server at http://www.cbs.dtu.dk/services/DeepLoc. Example code is available at https://github.com/JJAlmagro/subcellular_localization. The dataset is available at http://www.cbs.dtu.dk/services/DeepLoc/data.php. CONTACT: jjalma@dtu.dk.


Subject(s)
Computational Biology/methods , Machine Learning , Protein Transport , Sequence Analysis, Protein/methods , Software , Eukaryota/metabolism , Eukaryotic Cells/metabolism , Models, Biological , Molecular Sequence Annotation/methods , Neural Networks, Computer
8.
Bioinformatics ; 33(22): 3685-3690, 2017 Nov 15.
Article in English | MEDLINE | ID: mdl-28961695

ABSTRACT

MOTIVATION: Deep neural network architectures such as convolutional and long short-term memory networks have become increasingly popular as machine learning tools during the recent years. The availability of greater computational resources, more data, new algorithms for training deep models and easy to use libraries for implementation and training of neural networks are the drivers of this development. The use of deep learning has been especially successful in image recognition; and the development of tools, applications and code examples are in most cases centered within this field rather than within biology. RESULTS: Here, we aim to further the development of deep learning methods within biology by providing application examples and ready to apply and adapt code templates. Given such examples, we illustrate how architectures consisting of convolutional and long short-term memory neural networks can relatively easily be designed and trained to state-of-the-art performance on three biological sequence problems: prediction of subcellular localization, protein secondary structure and the binding of peptides to MHC Class II molecules. AVAILABILITY AND IMPLEMENTATION: All implementations and datasets are available online to the scientific community at https://github.com/vanessajurtz/lasagne4bio. CONTACT: skaaesonderby@gmail.com. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Machine Learning , Protein Structure, Secondary , Protein Transport , Sequence Analysis, Protein/methods , Computational Biology/methods , Neural Networks, Computer , Peptides/metabolism , Protein Binding
9.
Nucleic Acids Res ; 44(D1): D917-24, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26507857

ABSTRACT

Research on human and murine haematopoiesis has resulted in a vast number of gene-expression data sets that can potentially answer questions regarding normal and aberrant blood formation. To researchers and clinicians with limited bioinformatics experience, these data have remained available, yet largely inaccessible. Current databases provide information about gene-expression but fail to answer key questions regarding co-regulation, genetic programs or effect on patient survival. To address these shortcomings, we present BloodSpot (www.bloodspot.eu), which includes and greatly extends our previously released database HemaExplorer, a database of gene expression profiles from FACS sorted healthy and malignant haematopoietic cells. A revised interactive interface simultaneously provides a plot of gene expression along with a Kaplan-Meier analysis and a hierarchical tree depicting the relationship between different cell types in the database. The database now includes 23 high-quality curated data sets relevant to normal and malignant blood formation and, in addition, we have assembled and built a unique integrated data set, BloodPool. Bloodpool contains more than 2000 samples assembled from six independent studies on acute myeloid leukemia. Furthermore, we have devised a robust sample integration procedure that allows for sensitive comparison of user-supplied patient samples in a well-defined haematopoietic cellular space.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Hematopoiesis/genetics , Leukemia, Myeloid, Acute/genetics , Transcription, Genetic , Animals , Hematopoietic Stem Cells/metabolism , Humans , Leukemia, Myeloid, Acute/metabolism , Leukemia, Myeloid, Acute/mortality , Mice
10.
Cell Res ; 25(11): 1205-18, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26470845

ABSTRACT

ASXL1 mutations are frequently found in hematological tumors, and loss of Asxl1 promotes myeloid transformation in mice. Here we present data supporting a role for an ASXL1-BAP1 complex in the deubiquitylation of mono-ubiquitylated lysine 119 on Histone H2A (H2AK119ub1) in vivo. The Polycomb group proteins control the expression of the INK4B-ARF-INK4A locus during normal development, in part through catalyzing mono-ubiquitylation of H2AK119. Since the activation of the locus INK4B-ARF-INK4A plays a fail-safe mechanism protecting against tumorigenesis, we investigated whether ASXL1-dependent H2A deubiquitylation plays a role in its activation. Interestingly, we found that ASXL1 is specifically required for the increased expression of p15(INK4B) in response to both oncogenic signaling and extrinsic anti-proliferative signals. Since we found that ASXL1 and BAP1 both are enriched at the INK4B locus, our results suggest that activation of the INK4B locus requires ASXL1/BAP1-mediated deubiquitylation of H2AK119ub1. Consistently, our results show that ASXL1 mutations are associated with lower expression levels of p15(INK4B) and a proliferative advantage of hematopoietic progenitors in primary bone marrow cells, and that depletion of ASXL1 in multiple cell lines results in resistance to growth inhibitory signals. Taken together, this study links ASXL1-mediated H2A deubiquitylation and transcriptional activation of INK4B expression to its tumor suppressor functions.


Subject(s)
Cyclin-Dependent Kinase Inhibitor p15/metabolism , Histones/metabolism , Repressor Proteins/metabolism , Animals , Cell Line , Cell Proliferation , Humans , Mice , Mutation , Polycomb-Group Proteins/metabolism , Promoter Regions, Genetic , Tumor Suppressor Proteins/metabolism , Ubiquitin Thiolesterase/metabolism , Ubiquitin-Specific Proteases/metabolism
11.
Magn Reson Med ; 73(3): 1171-6, 2015 Mar.
Article in English | MEDLINE | ID: mdl-24639209

ABSTRACT

PURPOSE: The short diffusion time regime provides an interesting probe for tissue microstructure and can be investigated with oscillating gradient spin echo (OGSE) experiments. Several studies report new contrasts in preclinical settings and the first in vivo human experiments have recently been presented. One major hurdle in practical implementation is the low effective diffusion weighting provided at high frequency with limited gradient strength. THEORY: As a solution to the low diffusion weighting of OGSE, circularly polarized OGSE (CP-OGSE) is introduced. CP-OGSE gives a twofold increase in diffusion weighting with encoding in a plane rather than in one direction. CP-OGSE can be used for rotationally invariant acquisitions on anisotropic tissues. METHODS: Experiments with a 4.7 T preclinical scanner on a postmortem monkey brain as well as simulations were performed using conventional OGSE and CP-OGSE. RESULTS: Simulations and experiments show that CP-OGSE provides the same microstructural information as OGSE but provides more robust parameter estimates with limited gradient strength. CONCLUSIONS: CP-OGSE can be an important contribution for making OGSE imaging more effective in clinical imaging settings with limited gradient strength. Furthermore, the improved diffusion weighting can also be used to expand the investigated frequency range.


Subject(s)
Algorithms , Cerebellum/anatomy & histology , Diffusion Magnetic Resonance Imaging/methods , Image Enhancement/methods , Image Interpretation, Computer-Assisted/methods , Oscillometry/methods , Animals , Chlorocebus aethiops , Reproducibility of Results , Sensitivity and Specificity , Signal Processing, Computer-Assisted , Software
12.
Magn Reson Med ; 72(3): 756-62, 2014 Sep.
Article in English | MEDLINE | ID: mdl-24123426

ABSTRACT

PURPOSE: Double-wave diffusion experiments offer the possibility of probing correlation between molecular diffusion at multiple time points. It has recently been shown that this technique is capable of measuring the exchange of water across cellular membranes. The aim of this study was to investigate the effect of macroscopic tissue anisotropy on the measurement of the apparent exchange rate (AXR) in multicompartment systems. METHODS: AXR data were collected from yeast and perfusion-fixated brain tissue at high angular resolution on a preclinical imaging system. The AXR was expanded for anisotropic systems by calculating scalar AXR values along the principal directions of the diffusion tensor. RESULTS: In yeast, both the AXR and diffusivity were rotational invariant, whereas in fixated brain tissue, the measured AXR was sensitive to the orientation of anisotropic structures. AXR, especially in white matter, was robustly estimated along the first and second principal directions of the diffusion tensor, but increasing noise was seen in the AXR estimates along the third principal direction of the diffusion tensor. CONCLUSION: Our results indicate that tissue anisotropy must be considered for AXR estimates in complex biological systems.


Subject(s)
Brain/anatomy & histology , Diffusion Tensor Imaging/methods , Yeasts/cytology , Animals , Anisotropy , Chlorocebus aethiops
13.
NMR Biomed ; 26(12): 1647-62, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24038641

ABSTRACT

Pulsed field gradient diffusion sequences (PFG) with multiple diffusion encoding blocks have been indicated to offer new microstructural tissue information, such as the ability to detect nonspherical compartment shapes in macroscopically isotropic samples, i.e. samples with negligible directional signal dependence on diffusion gradients in standard diffusion experiments. However, current acquisition schemes are not rotationally invariant in the sense that the derived metrics depend on the orientation of the sample, and are affected by the interplay of sampling directions and compartment orientation dispersion when applied to macroscopically anisotropic systems. Here we propose a new framework, the d-PFG 5-design, to enable rotationally invariant estimation of double wave vector diffusion metrics (d-PFG). The method is based on the idea that an appropriate orientational average of the signal emulates the signal from a powder preparation of the same sample, where macroscopic anisotropy is absent by construction. Our approach exploits the theory of exact numerical integration (quadrature) of polynomials on the rotation group, and we exemplify the general procedure with a set consisting of 60 pairs of diffusion wave vectors (the d-PFG 5-design) facilitating a theoretically exact determination of the fourth order Taylor or cumulant expansion of the orientationally averaged signal. The d-PFG 5-design is evaluated with numerical simulations and ex vivo high field diffusion MRI experiments in a nonhuman primate brain. Specifically, we demonstrate rotational invariance when estimating compartment eccentricity, which we show offers new microstructural information, complementary to that of fractional anisotropy (FA) from diffusion tensor imaging (DTI). The imaging observations are supported by a new theoretical result, directly relating compartment eccentricity to FA of individual pores.


Subject(s)
Diffusion Magnetic Resonance Imaging/methods , Animals , Anisotropy , Chlorocebus aethiops , Computer Simulation , Diffusion , Imaging, Three-Dimensional , Rotation
SELECTION OF CITATIONS
SEARCH DETAIL
...