Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
J Cheminform ; 14(1): 47, 2022 Jul 15.
Article in English | MEDLINE | ID: mdl-35841114

ABSTRACT

Comparing chemical structures to infer protein targets and functions is a common approach, but basing comparisons on chemical similarity alone can be misleading. Here we present a methodology for predicting target protein clusters using deep neural networks. The model is trained on clusters of compounds based on similarities calculated from combined compound-protein and protein-protein interaction data using a network topology approach. We compare several deep learning architectures including both convolutional and recurrent neural networks. The best performing method, the recurrent neural network architecture MolPMoFiT, achieved an F1 score approaching 0.9 on a held-out test set of 8907 compounds. In addition, in-depth analysis on a set of eleven well-studied chemical compounds with known functions showed that predictions were justifiable for all but one of the chemicals. Four of the compounds, similar in their molecular structure but with dissimilarities in their function, revealed advantages of our method compared to using chemical similarity.

2.
Comput Methods Programs Biomed ; 209: 106318, 2021 Sep.
Article in English | MEDLINE | ID: mdl-34375851

ABSTRACT

BACKGROUND AND OBJECTIVE: To achieve the full potential of deep learning (DL) models, such as understanding the interplay between model (size), training strategy, and amount of training data, researchers and developers need access to new dedicated image datasets; i.e., annotated collections of images representing real-world problems with all their variations, complexity, limitations, and noise. Here, we present, describe and make freely available an annotated transmission electron microscopy (TEM) image dataset. It constitutes an interesting challenge for many practical applications in virology and epidemiology; e.g., virus detection, segmentation, classification, and novelty detection. We also present benchmarking results for virus detection and recognition using some of the top-performing (large and small) networks as well as a handcrafted very small network. We compare and evaluate transfer learning and training from scratch hypothesizing that with a limited dataset, transfer learning is crucial for good performance of a large network whereas our handcrafted small network performs relatively well when training from scratch. This is one step towards understanding how much training data is needed for a given task. METHODS: The benchmark dataset contains 1245 images of 22 virus classes. We propose a representative data split into training, validation, and test sets for this dataset. Moreover, we compare different established DL networks and present a baseline DL solution for classifying a subset of the 14 most-represented virus classes in the dataset. RESULTS: Our best model, DenseNet201 pre-trained on ImageNet and fine-tuned on the training set, achieved a 0.921 F1-score and 93.1% accuracy on the proposed representative test set. CONCLUSIONS: Public and real biomedical datasets are an important contribution and a necessity to increase the understanding of shortcomings, requirements, and potential improvements for deep learning solutions on biomedical problems or deploying solutions in clinical settings. We compared transfer learning to learning from scratch on this dataset and hypothesize that for limited-sized datasets transfer learning is crucial for achieving good performance for large models. Last but not least, we demonstrate the importance of application knowledge in creating datasets for training DL models and analyzing their results.


Subject(s)
Deep Learning , Neural Networks, Computer , Benchmarking , Microscopy, Electron, Transmission
3.
Comput Methods Programs Biomed ; 178: 31-39, 2019 Sep.
Article in English | MEDLINE | ID: mdl-31416558

ABSTRACT

BACKGROUND AND OBJECTIVE: Convolutional neural networks (CNNs) offer human experts-like performance and in the same time they are faster and more consistent in their prediction. However, most of the proposed CNNs require an expensive state-of-the-art hardware which substantially limits their use in practical scenarios and commercial systems, especially for clinical, biomedical and other applications that require on-the-fly analysis. In this paper, we investigate the possibility of making CNNs lighter by parametrizing the architecture and decreasing the number of trainable weights of a popular CNN: U-Net. METHODS: In order to demonstrate that comparable results can be achieved with substantially less trainable weights than the original U-Net we used a challenging application of a pixel-wise virus classification in Transmission Electron Microscopy images with minimal annotations (i.e. consisting only of the virus particle centers or centerlines). We explored 4 U-Net hyper-parameters: the number of base feature maps, the feature maps multiplier, the number of the encoding-decoding levels and the number of feature maps in the last 2 convolutional layers. RESULTS: Our experiments lead to two main conclusions: 1) the architecture hyper-parameters are pivotal if less trainable weights are to be used, and 2) if there is no restriction on the trainable weights number using a deeper network generally gives better results. However, training larger networks takes longer, typically requires more data and such networks are also more prone to overfitting. Our best model achieved an accuracy of 82.2% which is similar to the original U-Net while using nearly 4 times less trainable weights (7.8 M in comparison to 31.0 M). We also present a network with  < 2 M trainable weights that achieved an accuracy of 76.4%. CONCLUSIONS: The proposed U-Net hyper-parameter exploration can be adapted to other CNNs and other applications. It allows a comprehensive CNN architecture designing with the aim of a more efficient trainable weight use. Making the networks faster and lighter is crucial for their implementation in many practical applications. In addition, a lighter network ought to be less prone to over-fitting and hence generalize better.


Subject(s)
Microscopy, Electron, Transmission/methods , Viruses/ultrastructure , Algorithms , Computer Systems , Databases, Factual , Deep Learning , Image Processing, Computer-Assisted , Neural Networks, Computer , Reproducibility of Results
4.
SLAS Discov ; 23(10): 1030-1039, 2018 12.
Article in English | MEDLINE | ID: mdl-30074852

ABSTRACT

Image-based analysis is an increasingly important tool to characterize the effect of drugs in large-scale chemical screens. Herein, we present image and data analysis methods to investigate population cell-cycle dynamics in patient-derived brain tumor cells. Images of glioblastoma cells grown in multiwell plates were used to extract per-cell descriptors, including nuclear DNA content. We reduced the DNA content data from per-cell descriptors to per-well frequency distributions, which were used to identify compounds affecting cell-cycle phase distribution. We analyzed cells from 15 patient cases representing multiple subtypes of glioblastoma and searched for clusters of cell-cycle phase distributions characterizing similarities in response to 249 compounds at 11 doses. We show that this approach applied in a blind analysis with unlabeled substances identified drugs that are commonly used for treating solid tumors as well as other compounds that are well known for inducing cell-cycle arrest. Redistribution of nuclear DNA content signals is thus a robust metric of cell-cycle arrest in patient-derived glioblastoma cells.


Subject(s)
Antineoplastic Agents/pharmacology , Cell Cycle/drug effects , Drug Screening Assays, Antitumor/methods , Molecular Imaging/methods , Antineoplastic Agents/therapeutic use , Brain Neoplasms/drug therapy , Cell Line, Tumor , Dose-Response Relationship, Drug , Flow Cytometry/methods , Glioblastoma/drug therapy , Humans , Small Molecule Libraries
5.
PLoS One ; 12(11): e0188496, 2017.
Article in English | MEDLINE | ID: mdl-29190737

ABSTRACT

The choice of an optimal feature detector-descriptor combination for image matching often depends on the application and the image type. In this paper, we propose the Log-Polar Magnitude feature descriptor-a rotation, scale, and illumination invariant descriptor that achieves comparable performance to SIFT on a large variety of image registration problems but with much shorter feature vectors. The descriptor is based on the Log-Polar Transform followed by a Fourier Transform and selection of the magnitude spectrum components. Selecting different frequency components allows optimizing for image patterns specific for a particular application. In addition, by relying only on coordinates of the found features and (optionally) feature sizes our descriptor is completely detector independent. We propose 48- or 56-long feature vectors that potentially can be shortened even further depending on the application. Shorter feature vectors result in better memory usage and faster matching. This combined with the fact that the descriptor does not require a time-consuming feature orientation estimation (the rotation invariance is achieved solely by using the magnitude spectrum of the Log-Polar Transform) makes it particularly attractive to applications with limited hardware capacity. Evaluation is performed on the standard Oxford dataset and two different microscopy datasets; one with fluorescence and one with transmission electron microscopy images. Our method performs better than SURF and comparable to SIFT on the Oxford dataset, and better than SIFT on both microscopy datasets indicating that it is particularly useful in applications with microscopy images.


Subject(s)
Pattern Recognition, Automated , Fourier Analysis
6.
Nat Commun ; 8(1): 1541, 2017 11 16.
Article in English | MEDLINE | ID: mdl-29142246

ABSTRACT

The NUDIX enzymes are involved in cellular metabolism and homeostasis, as well as mRNA processing. Although highly conserved throughout all organisms, their biological roles and biochemical redundancies remain largely unclear. To address this, we globally resolve their individual properties and inter-relationships. We purify 18 of the human NUDIX proteins and screen 52 substrates, providing a substrate redundancy map. Using crystal structures, we generate sequence alignment analyses revealing four major structural classes. To a certain extent, their substrate preference redundancies correlate with structural classes, thus linking structure and activity relationships. To elucidate interdependence among the NUDIX hydrolases, we pairwise deplete them generating an epistatic interaction map, evaluate cell cycle perturbations upon knockdown in normal and cancer cells, and analyse their protein and mRNA expression in normal and cancer tissues. Using a novel FUSION algorithm, we integrate all data creating a comprehensive NUDIX enzyme profile map, which will prove fundamental to understanding their biological functionality.


Subject(s)
Gene Expression Profiling/methods , Gene Regulatory Networks , Multigene Family , Pyrophosphatases/genetics , A549 Cells , Cell Line , Cell Line, Tumor , Gene Expression Regulation, Enzymologic , Gene Expression Regulation, Neoplastic , Humans , MCF-7 Cells , Phylogeny , Pyrophosphatases/classification , Pyrophosphatases/metabolism , RNA Interference , Substrate Specificity , Nudix Hydrolases
7.
PLoS One ; 11(3): e0151554, 2016.
Article in English | MEDLINE | ID: mdl-26987120

ABSTRACT

Image-based screening typically produces quantitative measurements of cell appearance. Large-scale screens involving tens of thousands of images, each containing hundreds of cells described by hundreds of measurements, result in overwhelming amounts of data. Reducing per-cell measurements to the averages across the image(s) for each treatment leads to loss of potentially valuable information on population variability. We present PopulationProfiler-a new software tool that reduces per-cell measurements to population statistics. The software imports measurements from a simple text file, visualizes population distributions in a compact and comprehensive way, and can create gates for subpopulation classes based on control samples. We validate the tool by showing how PopulationProfiler can be used to analyze the effect of drugs that disturb the cell cycle, and compare the results to those obtained with flow cytometry.


Subject(s)
Flow Cytometry/methods , Software , Databases, Factual
SELECTION OF CITATIONS
SEARCH DETAIL
...