Search | VHL Regional Portal

Quasar: Easy Machine Learning for Biospectroscopy.

Toplak, Marko; Read, Stuart T; Sandt, Christophe; Borondics, Ferenc.

Cells ; 10(9)2021 09 03.

Article in English | MEDLINE | ID: mdl-34571947

ABSTRACT

Data volumes collected in many scientific fields have long exceeded the capacity of human comprehension. This is especially true in biomedical research where multiple replicates and techniques are required to conduct reliable studies. Ever-increasing data rates from new instruments compound our dependence on statistics to make sense of the numbers. The currently available data analysis tools lack user-friendliness, various capabilities or ease of access. Problem-specific software or scripts freely available in supplementary materials or research lab websites are often highly specialized, no longer functional, or simply too hard to use. Commercial software limits access and reproducibility, and is often unable to follow quickly changing, cutting-edge research demands. Finally, as machine learning techniques penetrate data analysis pipelines of the natural sciences, we see the growing demand for user-friendly and flexible tools to fuse machine learning with spectroscopy datasets. In our opinion, open-source software with strong community engagement is the way forward. To counter these problems, we develop Quasar, an open-source and user-friendly software, as a solution to these challenges. Here, we present case studies to highlight some Quasar features analyzing infrared spectroscopy data using various machine learning techniques.

Subject(s)

Spectrum Analysis/methods , Humans , Machine Learning , Reproducibility of Results , Software

Democratized image analytics by visual programming through integration of deep models and small-scale machine learning.

Godec, Primoz; Pancur, Matjaz; Ilenic, Nejc; Copar, Andrej; Strazar, Martin; Erjavec, Ales; Pretnar, Ajda; Demsar, Janez; Staric, Anze; Toplak, Marko; Zagar, Lan; Hartman, Jan; Wang, Hamilton; Bellazzi, Riccardo; Petrovic, Uros; Garagna, Silvia; Zuccotti, Maurizio; Park, Dongsu; Shaulsky, Gad; Zupan, Blaz.

Nat Commun ; 10(1): 4551, 2019 10 07.

Article in English | MEDLINE | ID: mdl-31591416

ABSTRACT

Analysis of biomedical images requires computational expertize that are uncommon among biomedical scientists. Deep learning approaches for image analysis provide an opportunity to develop user-friendly tools for exploratory data analysis. Here, we use the visual programming toolbox Orange ( http://orange.biolab.si ) to simplify image analysis by integrating deep-learning embedding, machine learning procedures, and data visualization. Orange supports the construction of data analysis workflows by assembling components for data preprocessing, visualization, and modeling. We equipped Orange with components that use pre-trained deep convolutional networks to profile images with vectors of features. These vectors are used in image clustering and classification in a framework that enables mining of image sets for both novel and experienced users. We demonstrate the utility of the tool in image analysis of progenitor cells in mouse bone healing, identification of developmental competence in mouse oocytes, subcellular protein localization in yeast, and developmental morphology of social amoebae.

Subject(s)

Computational Biology/methods , Image Processing, Computer-Assisted/methods , Machine Learning , Neural Networks, Computer , Animals , Dictyostelium/cytology , Dictyostelium/growth & development , Dictyostelium/metabolism , Green Fluorescent Proteins/genetics , Green Fluorescent Proteins/metabolism , Internet , Life Cycle Stages , Mice, Transgenic , Oocytes/metabolism , Reproducibility of Results , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/metabolism

Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models.

Toplak, Marko; Mocnik, Rok; Polajnar, Matija; Bosnic, Zoran; Carlsson, Lars; Hasselgren, Catrin; Demsar, Janez; Boyer, Scott; Zupan, Blaz; Stålring, Jonna.

J Chem Inf Model ; 54(2): 431-41, 2014 Feb 24.

Article in English | MEDLINE | ID: mdl-24490838

ABSTRACT

The vastness of chemical space and the relatively small coverage by experimental data recording molecular properties require us to identify subspaces, or domains, for which we can confidently apply QSAR models. The prediction of QSAR models in these domains is reliable, and potential subsequent investigations of such compounds would find that the predictions closely match the experimental values. Standard approaches in QSAR assume that predictions are more reliable for compounds that are "similar" to those in subspaces with denser experimental data. Here, we report on a study of an alternative set of techniques recently proposed in the machine learning community. These methods quantify prediction confidence through estimation of the prediction error at the point of interest. Our study includes 20 public QSAR data sets with continuous response and assesses the quality of 10 reliability scoring methods by observing their correlation with prediction error. We show that these new alternative approaches can outperform standard reliability scores that rely only on similarity to compounds in the training set. The results also indicate that the quality of reliability scoring methods is sensitive to data set characteristics and to the regression method used in QSAR. We demonstrate that at the cost of increased computational complexity these dependencies can be leveraged by integration of scores from various reliability estimation approaches. The reliability estimation techniques described in this paper have been implemented in an open source add-on package ( https://bitbucket.org/biolab/orange-reliability ) to the Orange data mining suite.

Subject(s)

Artificial Intelligence , Drug Discovery/methods , Quantitative Structure-Activity Relationship , Algorithms , Regression Analysis , Time Factors

ABC transporters in Dictyostelium discoideum development.

Miranda, Edward Roshan; Zhuchenko, Olga; Toplak, Marko; Santhanam, Balaji; Zupan, Blaz; Kuspa, Adam; Shaulsky, Gad.

PLoS One ; 8(8): e70040, 2013.

Article in English | MEDLINE | ID: mdl-23967067

ABSTRACT

ATP-binding cassette (ABC) transporters can translocate a broad spectrum of molecules across the cell membrane including physiological cargo and toxins. ABC transporters are known for the role they play in resistance towards anticancer agents in chemotherapy of cancer patients. There are 68 ABC transporters annotated in the genome of the social amoeba Dictyostelium discoideum. We have characterized more than half of these ABC transporters through a systematic study of mutations in their genes. We have analyzed morphological and transcriptional phenotypes for these mutants during growth and development and found that most of the mutants exhibited rather subtle phenotypes. A few of the genes may share physiological functions, as reflected in their transcriptional phenotypes. Since most of the abc-transporter mutants showed subtle morphological phenotypes, we utilized these transcriptional phenotypes to identify genes that are important for development by looking for transcripts whose abundance was unperturbed in most of the mutants. We found a set of 668 genes that includes many validated D. discoideum developmental genes. We have also found that abcG6 and abcG18 may have potential roles in intercellular signaling during terminal differentiation of spores and stalks.

Subject(s)

ATP-Binding Cassette Transporters/genetics , ATP-Binding Cassette Transporters/metabolism , Dictyostelium/growth & development , Dictyostelium/metabolism , Protozoan Proteins/genetics , Protozoan Proteins/metabolism , Cell Differentiation/genetics , Dictyostelium/cytology , Dictyostelium/genetics , Mutation , Phenotype , Spores, Protozoan/cytology , Spores, Protozoan/genetics , Spores, Protozoan/growth & development , Spores, Protozoan/metabolism , Transcription, Genetic

Transcriptional profiling of Dictyostelium with RNA sequencing.

Miranda, Edward Roshan; Rot, Gregor; Toplak, Marko; Santhanam, Balaji; Curk, Tomaz; Shaulsky, Gad; Zupan, Blaz.

Methods Mol Biol ; 983: 139-71, 2013.

Article in English | MEDLINE | ID: mdl-23494306

ABSTRACT

Transcriptional profiling methods have been utilized in the analysis of various biological processes in Dictyostelium. Recent advances in high-throughput sequencing have increased the resolution and the dynamic range of transcriptional profiling. Here we describe the utility of RNA sequencing with the Illumina technology for production of transcriptional profiles. We also describe methods for data mapping and storage as well as common and specialized tools for data analysis, both online and offline.

Subject(s)

Dictyostelium/genetics , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Base Sequence , DNA Primers , DNA, Complementary/genetics , Data Mining , Dictyostelium/metabolism , Gene Library , Genome, Protozoan , High-Throughput Nucleotide Sequencing/methods , Molecular Sequence Annotation , Molecular Sequence Data , Polymerase Chain Reaction , Protozoan Proteins/genetics , Protozoan Proteins/metabolism , RNA, Messenger/genetics , RNA, Messenger/isolation & purification , RNA, Messenger/metabolism , RNA, Protozoan/genetics , RNA, Protozoan/isolation & purification , RNA, Protozoan/metabolism , Software

Does replication groups scoring reduce false positive rate in SNP interaction discovery?

Toplak, Marko; Curk, Tomaz; Demsar, Janez; Zupan, Blaz.

BMC Genomics ; 11: 58, 2010 Jan 22.

Article in English | MEDLINE | ID: mdl-20092660

ABSTRACT

BACKGROUND: Computational methods that infer single nucleotide polymorphism (SNP) interactions from phenotype data may uncover new biological mechanisms in non-Mendelian diseases. However, practical aspects of such analysis face many problems. Present experimental studies typically use SNP arrays with hundreds of thousands of SNPs but record only hundreds of samples. Candidate SNP pairs inferred by interaction analysis may include a high proportion of false positives. Recently, Gayan et al. (2008) proposed to reduce the number of false positives by combining results of interaction analysis performed on subsets of data (replication groups), rather than analyzing the entire data set directly. If performing as hypothesized, replication groups scoring could improve interaction analysis and also any type of feature ranking and selection procedure in systems biology. Because Gayan et al. do not compare their approach to the standard interaction analysis techniques, we here investigate if replication groups indeed reduce the number of reported false positive interactions. RESULTS: A set of simulated and false interaction-imputed experimental SNP data sets were used to compare the inference of SNP-SNP interactions by means of replication groups to the standard approach where the entire data set was directly used to score all candidate SNP pairs. In all our experiments, the inference of interactions from the entire data set (e.g. without using the replication groups) reported fewer false positives. CONCLUSIONS: With respect to the direct scoring approach the utility of replication groups does not reduce false positive rates, and may, depending on the data set, often perform worse.

Subject(s)

Computational Biology/methods , Data Interpretation, Statistical , Polymorphism, Single Nucleotide , Computer Simulation , False Positive Reactions , Models, Genetic

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL