Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Neuroimage ; 120: 225-53, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26067346

RESUMO

Neuroscientific data is typically analyzed based on the behavioral response of the participant. However, the errors made may or may not be in line with the neural processing. In particular in experiments with time pressure or studies where the threshold of perception is measured, the error distribution deviates from uniformity due to the structure in the underlying experimental set-up. When we base our analysis on the behavioral labels as usually done, then we ignore this problem of systematic and structured (non-uniform) label noise and are likely to arrive at wrong conclusions in our data analysis. This paper contributes a remedy to this important scenario: we present a novel approach for a) measuring label noise and b) removing structured label noise. We demonstrate its usefulness for EEG data analysis using a standard d2 test for visual attention (N=20 participants).


Assuntos
Atenção/fisiologia , Encéfalo/fisiologia , Neurociência Cognitiva/métodos , Eletroencefalografia/métodos , Potenciais Evocados/fisiologia , Aprendizado de Máquina não Supervisionado , Adulto , Feminino , Humanos , Masculino , Reconhecimento Visual de Modelos , Adulto Jovem
2.
Bioinformatics ; 30(9): 1300-1, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24413671

RESUMO

We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.


Assuntos
RNA/genética , Análise de Sequência de RNA/métodos , Transcriptoma , Sequência de Bases , Internet , Software
3.
NAR Genom Bioinform ; 3(3): lqab065, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34296082

RESUMO

Deep learning has revolutionized data science in many fields by greatly improving prediction performances in comparison to conventional approaches. Recently, explainable artificial intelligence has emerged as an area of research that goes beyond pure prediction improvement by extracting knowledge from deep learning methodologies through the interpretation of their results. We investigate such explanations to explore the genetic architectures of phenotypes in genome-wide association studies. Instead of testing each position in the genome individually, the novel three-step algorithm, called DeepCOMBI, first trains a neural network for the classification of subjects into their respective phenotypes. Second, it explains the classifiers' decisions by applying layer-wise relevance propagation as one example from the pool of explanation techniques. The resulting importance scores are eventually used to determine a subset of the most relevant locations for multiple hypothesis testing in the third step. The performance of DeepCOMBI in terms of power and precision is investigated on generated datasets and a 2007 study. Verification of the latter is achieved by validating all findings with independent studies published up until 2020. DeepCOMBI is shown to outperform ordinary raw P-value thresholding and other baseline methods. Two novel disease associations (rs10889923 for hypertension, rs4769283 for type 1 diabetes) were identified.

4.
IEEE Trans Neural Netw Learn Syst ; 31(7): 2680-2684, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31494564

RESUMO

Many learning tasks in the field of natural language processing including sequence tagging, sequence segmentation, and syntactic parsing have been successfully approached by means of structured prediction methods. An appealing property of the corresponding training algorithms is their ability to integrate the loss function of interest into the optimization process improving the final results according to the chosen measure of performance. Here, we focus on the task of constituency parsing and show how to optimize the model for the F1 -score in the max-margin framework of a structural support vector machine (SVM). For reasons of computational efficiency, it is a common approach to binarize the corresponding grammar before training. Unfortunately, this introduces a bias during the training procedure as the corresponding loss function is evaluated on the binary representation, while the resulting performance is measured on the original unbinarized trees. Here, we address this problem by extending the inference procedure presented by Bauer et al. Specifically, we propose an algorithmic modification that allows evaluating the loss on the unbinarized trees. The new approach properly models the loss function of interest resulting in better prediction accuracy and still benefits from the computational efficiency due to binarized representation. The presented idea can be easily transferred to other structured loss functions.

5.
Sci Rep ; 9(1): 20353, 2019 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-31889137

RESUMO

In many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsupervised clustering problems and show its effectiveness in the field of single-cell RNA sequencing (scRNA-Seq). The goal of scRNA-Seq experiments is often the definition and cataloguing of cell types from the transcriptional output of individual cells. To improve the clustering of small disease- or tissue-specific datasets, for which the identification of rare cell types is often problematic, we propose a transfer learning method to utilize large and well-annotated reference datasets, such as those produced by the Human Cell Atlas. Our approach modifies the dataset of interest while incorporating key information from the larger reference dataset via Non-negative Matrix Factorization (NMF). The modified dataset is subsequently provided to a clustering algorithm. We empirically evaluate the benefits of our approach on simulated scRNA-Seq data as well as on publicly available datasets. Finally, we present results for the analysis of a recently published small dataset and find improved clustering when transferring knowledge from a large reference dataset. Implementations of the method are available at https://github.com/nicococo/scRNA.


Assuntos
Análise por Conglomerados , Biologia Computacional , Perfilação da Expressão Gênica , Aprendizado de Máquina , Análise de Sequência de RNA , Análise de Célula Única , Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Curva ROC , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Transcriptoma
6.
IEEE Trans Neural Netw Learn Syst ; 29(9): 3994-4006, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-28961127

RESUMO

We present ClusterSVDD, a methodology that unifies support vector data descriptions (SVDDs) and $k$ -means clustering into a single formulation. This allows both methods to benefit from one another, i.e., by adding flexibility using multiple spheres for SVDDs and increasing anomaly resistance and flexibility through kernels to $k$ -means. In particular, our approach leads to a new interpretation of $k$ -means as a regularized mode seeking algorithm. The unifying formulation further allows for deriving new algorithms by transferring knowledge from one-class learning settings to clustering settings and vice versa. As a showcase, we derive a clustering method for structured data based on a one-class learning scenario. Additionally, our formulation can be solved via a particularly simple optimization scheme. We evaluate our approach empirically to highlight some of the proposed benefits on artificially generated data, as well as on real-world problems, and provide a Python software package comprising various implementations of primal and dual SVDD as well as our proposed ClusterSVDD.

7.
IEEE Trans Pattern Anal Mach Intell ; 40(12): 2841-2852, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-29989981

RESUMO

In order to solve large-scale lasso problems, screening algorithms have been developed that discard features with zero coefficients based on a computationally efficient screening rule. Most existing screening rules were developed from a spherical constraint and half-space constraints on a dual optimal solution. However, existing rules admit at most two half-space constraints due to the computational cost incurred by the half-spaces, even though additional constraints may be useful to discard more features. In this paper, we present AdaScreen, an adaptive lasso screening rule ensemble, which allows to combine any one sphere with multiple half-space constraints on a dual optimal solution. Thanks to geometrical considerations that lead to a simple closed form solution for AdaScreen, we can incorporate multiple half-space constraints at small computational cost. In our experiments, we show that AdaScreen with multiple half-space constraints simultaneously improves screening performance and speeds up lasso solvers.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Simulação por Computador , Bases de Dados Factuais , Humanos
8.
IEEE Trans Neural Netw Learn Syst ; 29(7): 2743-2756, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-28541228

RESUMO

Analyzing data with latent spatial and/or temporal structure is a challenge for machine learning. In this paper, we propose a novel nonlinear model for studying data with latent dependence structure. It successfully combines the concepts of Markov random fields, transductive learning, and regression, making heavy use of the notion of joint feature maps. Our transductive conditional random field regression model is able to infer the latent states by combining limited labeled data of high precision with unlabeled data containing measurement uncertainty. In this manner, we can propagate accurate information and greatly reduce uncertainty. We demonstrate the usefulness of our novel framework on generated time series data with the known temporal structure and successfully validate it on synthetic as well as real-world offshore data with the spatial structure from the oil industry to predict rock porosities from acoustic impedance data.

9.
PLoS One ; 12(3): e0174392, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28346487

RESUMO

High prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. For computational biology, positional oligomer importance matrices (POIMs) have been successfully applied to explain the decision of support vector machines (SVMs) using weighted-degree (WD) kernels. To extract relevant biological motifs from POIMs, the motifPOIM method has been devised and showed promising results on real-world data. Our contribution in this paper is twofold: as an extension to POIMs, we propose gPOIM, a general measure of feature importance for arbitrary learning machines and feature sets (including, but not limited to, SVMs and CNNs) and devise a sampling strategy for efficient computation. As a second contribution, we derive a convex formulation of motifPOIMs that leads to more reliable motif extraction from gPOIMs. Empirical evaluations confirm the usefulness of our approach on artificially generated data as well as on real-world datasets.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Máquina de Vetores de Suporte , Algoritmos
10.
PLoS One ; 10(12): e0144782, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26690911

RESUMO

Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but--due to its black-box character--motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs--regardless of their length and complexity--underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set.


Assuntos
Aprendizado de Máquina , Modelos Genéticos , Motivos de Nucleotídeos , Análise de Sequência de DNA/métodos , Humanos
11.
IEEE Trans Neural Netw Learn Syst ; 25(5): 870-81, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-24808034

RESUMO

The task of structured output prediction deals with learning general functional dependencies between arbitrary input and output spaces. In this context, two loss-sensitive formulations for maximum-margin training have been proposed in the literature, which are referred to as margin and slack rescaling, respectively. The latter is believed to be more accurate and easier to handle. Nevertheless, it is not popular due to the lack of known efficient inference algorithms; therefore, margin rescaling--which requires a similar type of inference as normal structured prediction--is the most often used approach. Focusing on the task of label sequence learning, we here define a general framework that can handle a large class of inference problems based on Hamming-like loss functions and the concept of decomposability for the underlying joint feature map. In particular, we present an efficient generic algorithm that can handle both rescaling approaches and is guaranteed to find an optimal solution in polynomial time.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA