Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Cells ; 10(9)2021 09 03.
Artículo en Inglés | MEDLINE | ID: mdl-34571947

RESUMEN

Data volumes collected in many scientific fields have long exceeded the capacity of human comprehension. This is especially true in biomedical research where multiple replicates and techniques are required to conduct reliable studies. Ever-increasing data rates from new instruments compound our dependence on statistics to make sense of the numbers. The currently available data analysis tools lack user-friendliness, various capabilities or ease of access. Problem-specific software or scripts freely available in supplementary materials or research lab websites are often highly specialized, no longer functional, or simply too hard to use. Commercial software limits access and reproducibility, and is often unable to follow quickly changing, cutting-edge research demands. Finally, as machine learning techniques penetrate data analysis pipelines of the natural sciences, we see the growing demand for user-friendly and flexible tools to fuse machine learning with spectroscopy datasets. In our opinion, open-source software with strong community engagement is the way forward. To counter these problems, we develop Quasar, an open-source and user-friendly software, as a solution to these challenges. Here, we present case studies to highlight some Quasar features analyzing infrared spectroscopy data using various machine learning techniques.


Asunto(s)
Análisis Espectral/métodos , Humanos , Aprendizaje Automático , Reproducibilidad de los Resultados , Programas Informáticos
2.
Nat Commun ; 10(1): 4551, 2019 10 07.
Artículo en Inglés | MEDLINE | ID: mdl-31591416

RESUMEN

Analysis of biomedical images requires computational expertize that are uncommon among biomedical scientists. Deep learning approaches for image analysis provide an opportunity to develop user-friendly tools for exploratory data analysis. Here, we use the visual programming toolbox Orange ( http://orange.biolab.si ) to simplify image analysis by integrating deep-learning embedding, machine learning procedures, and data visualization. Orange supports the construction of data analysis workflows by assembling components for data preprocessing, visualization, and modeling. We equipped Orange with components that use pre-trained deep convolutional networks to profile images with vectors of features. These vectors are used in image clustering and classification in a framework that enables mining of image sets for both novel and experienced users. We demonstrate the utility of the tool in image analysis of progenitor cells in mouse bone healing, identification of developmental competence in mouse oocytes, subcellular protein localization in yeast, and developmental morphology of social amoebae.


Asunto(s)
Biología Computacional/métodos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático , Redes Neurales de la Computación , Animales , Dictyostelium/citología , Dictyostelium/crecimiento & desarrollo , Dictyostelium/metabolismo , Proteínas Fluorescentes Verdes/genética , Proteínas Fluorescentes Verdes/metabolismo , Internet , Estadios del Ciclo de Vida , Ratones Transgénicos , Oocitos/metabolismo , Reproducibilidad de los Resultados , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
3.
J Chem Inf Model ; 54(2): 431-41, 2014 Feb 24.
Artículo en Inglés | MEDLINE | ID: mdl-24490838

RESUMEN

The vastness of chemical space and the relatively small coverage by experimental data recording molecular properties require us to identify subspaces, or domains, for which we can confidently apply QSAR models. The prediction of QSAR models in these domains is reliable, and potential subsequent investigations of such compounds would find that the predictions closely match the experimental values. Standard approaches in QSAR assume that predictions are more reliable for compounds that are "similar" to those in subspaces with denser experimental data. Here, we report on a study of an alternative set of techniques recently proposed in the machine learning community. These methods quantify prediction confidence through estimation of the prediction error at the point of interest. Our study includes 20 public QSAR data sets with continuous response and assesses the quality of 10 reliability scoring methods by observing their correlation with prediction error. We show that these new alternative approaches can outperform standard reliability scores that rely only on similarity to compounds in the training set. The results also indicate that the quality of reliability scoring methods is sensitive to data set characteristics and to the regression method used in QSAR. We demonstrate that at the cost of increased computational complexity these dependencies can be leveraged by integration of scores from various reliability estimation approaches. The reliability estimation techniques described in this paper have been implemented in an open source add-on package ( https://bitbucket.org/biolab/orange-reliability ) to the Orange data mining suite.


Asunto(s)
Inteligencia Artificial , Descubrimiento de Drogas/métodos , Relación Estructura-Actividad Cuantitativa , Algoritmos , Análisis de Regresión , Factores de Tiempo
4.
PLoS One ; 8(8): e70040, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23967067

RESUMEN

ATP-binding cassette (ABC) transporters can translocate a broad spectrum of molecules across the cell membrane including physiological cargo and toxins. ABC transporters are known for the role they play in resistance towards anticancer agents in chemotherapy of cancer patients. There are 68 ABC transporters annotated in the genome of the social amoeba Dictyostelium discoideum. We have characterized more than half of these ABC transporters through a systematic study of mutations in their genes. We have analyzed morphological and transcriptional phenotypes for these mutants during growth and development and found that most of the mutants exhibited rather subtle phenotypes. A few of the genes may share physiological functions, as reflected in their transcriptional phenotypes. Since most of the abc-transporter mutants showed subtle morphological phenotypes, we utilized these transcriptional phenotypes to identify genes that are important for development by looking for transcripts whose abundance was unperturbed in most of the mutants. We found a set of 668 genes that includes many validated D. discoideum developmental genes. We have also found that abcG6 and abcG18 may have potential roles in intercellular signaling during terminal differentiation of spores and stalks.


Asunto(s)
Transportadoras de Casetes de Unión a ATP/genética , Transportadoras de Casetes de Unión a ATP/metabolismo , Dictyostelium/crecimiento & desarrollo , Dictyostelium/metabolismo , Proteínas Protozoarias/genética , Proteínas Protozoarias/metabolismo , Diferenciación Celular/genética , Dictyostelium/citología , Dictyostelium/genética , Mutación , Fenotipo , Esporas Protozoarias/citología , Esporas Protozoarias/genética , Esporas Protozoarias/crecimiento & desarrollo , Esporas Protozoarias/metabolismo , Transcripción Genética
5.
Methods Mol Biol ; 983: 139-71, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23494306

RESUMEN

Transcriptional profiling methods have been utilized in the analysis of various biological processes in Dictyostelium. Recent advances in high-throughput sequencing have increased the resolution and the dynamic range of transcriptional profiling. Here we describe the utility of RNA sequencing with the Illumina technology for production of transcriptional profiles. We also describe methods for data mapping and storage as well as common and specialized tools for data analysis, both online and offline.


Asunto(s)
Dictyostelium/genética , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Secuencia de Bases , Cartilla de ADN , ADN Complementario/genética , Minería de Datos , Dictyostelium/metabolismo , Biblioteca de Genes , Genoma de Protozoos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Anotación de Secuencia Molecular , Datos de Secuencia Molecular , Reacción en Cadena de la Polimerasa , Proteínas Protozoarias/genética , Proteínas Protozoarias/metabolismo , ARN Mensajero/genética , ARN Mensajero/aislamiento & purificación , ARN Mensajero/metabolismo , ARN Protozoario/genética , ARN Protozoario/aislamiento & purificación , ARN Protozoario/metabolismo , Programas Informáticos
6.
BMC Genomics ; 11: 58, 2010 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-20092660

RESUMEN

BACKGROUND: Computational methods that infer single nucleotide polymorphism (SNP) interactions from phenotype data may uncover new biological mechanisms in non-Mendelian diseases. However, practical aspects of such analysis face many problems. Present experimental studies typically use SNP arrays with hundreds of thousands of SNPs but record only hundreds of samples. Candidate SNP pairs inferred by interaction analysis may include a high proportion of false positives. Recently, Gayan et al. (2008) proposed to reduce the number of false positives by combining results of interaction analysis performed on subsets of data (replication groups), rather than analyzing the entire data set directly. If performing as hypothesized, replication groups scoring could improve interaction analysis and also any type of feature ranking and selection procedure in systems biology. Because Gayan et al. do not compare their approach to the standard interaction analysis techniques, we here investigate if replication groups indeed reduce the number of reported false positive interactions. RESULTS: A set of simulated and false interaction-imputed experimental SNP data sets were used to compare the inference of SNP-SNP interactions by means of replication groups to the standard approach where the entire data set was directly used to score all candidate SNP pairs. In all our experiments, the inference of interactions from the entire data set (e.g. without using the replication groups) reported fewer false positives. CONCLUSIONS: With respect to the direct scoring approach the utility of replication groups does not reduce false positive rates, and may, depending on the data set, often perform worse.


Asunto(s)
Biología Computacional/métodos , Interpretación Estadística de Datos , Polimorfismo de Nucleótido Simple , Simulación por Computador , Reacciones Falso Positivas , Modelos Genéticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...