Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Neural Comput ; 34(3): 595-641, 2022 02 17.
Artigo em Inglês | MEDLINE | ID: mdl-35026002

RESUMO

The presence of manifolds is a common assumption in many applications, including astronomy and computer vision. For instance, in astronomy, low-dimensional stellar structures, such as streams, shells, and globular clusters, can be found in the neighborhood of big galaxies such as the Milky Way. Since these structures are often buried in very large data sets, an algorithm, which can not only recover the manifold but also remove the background noise (or outliers), is highly desirable. While other works try to recover manifolds either by pushing all points toward manifolds or by downsampling from dense regions, aiming to solve one of the problems, they generally fail to suppress the noise on manifolds and remove background noise simultaneously. Inspired by the collective behavior of biological ants in food-seeking process, we propose a new algorithm that employs several random walkers equipped with a local alignment measure to detect and denoise manifolds. During the walking process, the agents release pheromone on data points, which reinforces future movements. Over time the pheromone concentrates on the manifolds, while it fades in the background noise due to an evaporation procedure. We use the Markov chain (MC) framework to provide a theoretical analysis of the convergence of the algorithm and its performance. Moreover, an empirical analysis, based on synthetic and real-world data sets, is provided to demonstrate its applicability in different areas, such as improving the performance of t-distributed stochastic neighbor embedding (t-SNE) and spectral clustering using the underlying MC formulas, recovering astronomical low-dimensional structures, and improving the performance of the fast Parzen window density estimator.


Assuntos
Formigas , Algoritmos , Animais , Análise por Conglomerados
2.
Dev Psychopathol ; 33(3): 980-991, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-32571444

RESUMO

Less is known about the relationship between conduct disorder (CD), callous-unemotional (CU) traits, and positive and negative parenting in youth compared to early childhood. We combined traditional univariate analyses with a novel machine learning classifier (Angle-based Generalized Matrix Learning Vector Quantization) to classify youth (N = 756; 9-18 years) into typically developing (TD) or CD groups with or without elevated CU traits (CD/HCU, CD/LCU, respectively) using youth- and parent-reports of parenting behavior. At the group level, both CD/HCU and CD/LCU were associated with high negative and low positive parenting relative to TD. However, only positive parenting differed between the CD/HCU and CD/LCU groups. In classification analyses, performance was best when distinguishing CD/HCU from TD groups and poorest when distinguishing CD/HCU from CD/LCU groups. Positive and negative parenting were both relevant when distinguishing CD/HCU from TD, negative parenting was most relevant when distinguishing between CD/LCU and TD, and positive parenting was most relevant when distinguishing CD/HCU from CD/LCU groups. These findings suggest that while positive parenting distinguishes between CD/HCU and CD/LCU, negative parenting is associated with both CD subtypes. These results highlight the importance of considering multiple parenting behaviors in CD with varying levels of CU traits in late childhood/adolescence.


Assuntos
Transtorno da Conduta , Adolescente , Criança , Pré-Escolar , Emoções , Empatia , Humanos , Poder Familiar
3.
J Theor Biol ; 455: 222-231, 2018 10 14.
Artigo em Inglês | MEDLINE | ID: mdl-30048717

RESUMO

To understand trends in individual responses to medication, one can take a purely data-driven machine learning approach, or alternatively apply pharmacokinetics combined with mixed-effects statistical modelling. To take advantage of the predictive power of machine learning and the explanatory power of pharmacokinetics, we propose a latent variable mixture model for learning clusters of pharmacokinetic models demonstrated on a clinical data set investigating 11ß-hydroxysteroid dehydrogenase enzymes (11ß-HSD) activity in healthy adults. The proposed strategy automatically constructs different population models that are not based on prior knowledge or experimental design, but result naturally as mixture component models of the global latent variable mixture model. We study the parameter of the underlying multi-compartment ordinary differential equation model via identifiability analysis on the observable measurements, which reveals the model is structurally locally identifiable. Further approximation with a perturbation technique enables efficient training of the proposed probabilistic latent variable mixture clustering technique using Estimation Maximization. The training on the clinical data results in 4 clusters reflecting the prednisone conversion rate over a period of 4 h based on venous blood samples taken at 20-min intervals. The learned clusters differ in prednisone absorption as well as prednisone/prednisolone conversion. In the discussion section we include a detailed investigation of the relationship of the pharmacokinetic parameters of the trained cluster models for possible or plausible physiological explanation and correlations analysis using additional phenotypic participant measurements.


Assuntos
Glucocorticoides/farmacocinética , Modelos Biológicos , Prednisolona/farmacocinética , Prednisona/farmacocinética , 11-beta-Hidroxiesteroide Desidrogenases/metabolismo , Adulto , Idoso , Feminino , Glucocorticoides/administração & dosagem , Humanos , Aprendizado de Máquina , Pessoa de Meia-Idade , Prednisolona/administração & dosagem , Prednisona/administração & dosagem
4.
Cell Syst ; 5(5): 485-497.e3, 2017 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-28988802

RESUMO

We report the results of a DREAM challenge designed to predict relative genetic essentialities based on a novel dataset testing 98,000 shRNAs against 149 molecularly characterized cancer cell lines. We analyzed the results of over 3,000 submissions over a period of 4 months. We found that algorithms combining essentiality data across multiple genes demonstrated increased accuracy; gene expression was the most informative molecular data type; the identity of the gene being predicted was far more important than the modeling strategy; well-predicted genes and selected molecular features showed enrichment in functional categories; and frequently selected expression features correlated with survival in primary tumors. This study establishes benchmarks for gene essentiality prediction, presents a community resource for future comparison with this benchmark, and provides insights into factors influencing the ability to predict gene essentiality from functional genetic screens. This study also demonstrates the value of releasing pre-publication data publicly to engage the community in an open research collaboration.


Assuntos
Expressão Gênica/genética , Genes Essenciais/genética , Algoritmos , Linhagem Celular Tumoral , Genômica/métodos , Humanos , RNA Interferente Pequeno/genética
6.
Nat Commun ; 7: 12460, 2016 08 23.
Artigo em Inglês | MEDLINE | ID: mdl-27549343

RESUMO

Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment efficacy in RA patients was performed in the context of a DREAM Challenge (http://www.synapse.org/RA_Challenge). An open challenge framework enabled the comparative evaluation of predictions developed by 73 research groups using the most comprehensive available data and covering a wide range of state-of-the-art modelling methodologies. Despite a significant genetic heritability estimate of treatment non-response trait (h(2)=0.18, P value=0.02), no significant genetic contribution to prediction accuracy is observed. Results formally confirm the expectations of the rheumatology community that SNP information does not significantly improve predictive performance relative to standard clinical traits, thereby justifying a refocusing of future efforts on collection of other data.


Assuntos
Anticorpos Monoclonais Humanizados/uso terapêutico , Artrite Reumatoide/tratamento farmacológico , Predisposição Genética para Doença/genética , Polimorfismo de Nucleotídeo Único , Fator de Necrose Tumoral alfa/antagonistas & inibidores , Adulto , Idoso , Anticorpos Monoclonais/uso terapêutico , Antirreumáticos/uso terapêutico , Artrite Reumatoide/genética , Artrite Reumatoide/patologia , Certolizumab Pegol/uso terapêutico , Estudos de Coortes , Crowdsourcing , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Resultado do Tratamento , Fator de Necrose Tumoral alfa/imunologia
7.
Bioinformatics ; 32(16): 2457-63, 2016 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153643

RESUMO

MOTIVATION: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple data sources, extending a recent method Group Factor Analysis to have a biclustering interpretation with additional sparsity assumptions. The resulting method enables data-driven detection of linear structure present in parts of the data sources. RESULTS: Our simulation studies show that the proposed method reliably infers biclusters from heterogeneous data sources. We tested the method on data from the NCI-DREAM drug sensitivity prediction challenge, resulting in an excellent prediction accuracy. Moreover, the predictions are based on several biclusters which provide insight into the data sources, in this case on gene expression, DNA methylation, protein abundance, exome sequence, functional connectivity fingerprints and drug sensitivity. AVAILABILITY AND IMPLEMENTATION: http://research.cs.aalto.fi/pml/software/GFAsparse/ CONTACTS: : kerstin.bunte@googlemail.com or samuel.kaski@aalto.fi.


Assuntos
Algoritmos , Análise por Conglomerados , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Teorema de Bayes , Análise Fatorial , Armazenamento e Recuperação da Informação , Análise de Sequência com Séries de Oligonucleotídeos
8.
PLoS One ; 8(3): e59401, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23527184

RESUMO

Flow cytometry is a widely used technique for the analysis of cell populations in the study and diagnosis of human diseases. It yields large amounts of high-dimensional data, the analysis of which would clearly benefit from efficient computational approaches aiming at automated diagnosis and decision support. This article presents our analysis of flow cytometry data in the framework of the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukemia (AML) Challenge, 2011. In the challenge, example data was provided for a set of 179 subjects, comprising healthy donors and 23 cases of AML. The participants were asked to provide predictions with respect to the condition of 180 patients in a test set. We extracted feature vectors from the data in terms of single marker statistics, including characteristic moments, median and interquartile range of the observed values. Subsequently, we applied Generalized Matrix Relevance Learning Vector Quantization (GMLVQ), a machine learning technique which extends standard LVQ by an adaptive distance measure. Our method achieved the best possible performance with respect to the diagnoses of test set patients. The extraction of features from the flow cytometry data is outlined in detail, the machine learning approach is discussed and classification results are presented. In addition, we illustrate how GMLVQ can provide deeper insight into the problem by allowing to infer the relevance of specific markers and features for the diagnosis.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Técnicas de Apoio para a Decisão , Diagnóstico por Computador/métodos , Citometria de Fluxo/métodos , Leucemia Mieloide Aguda/diagnóstico , Biomarcadores , Humanos
9.
Artif Intell Med ; 56(2): 91-7, 2012 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23010586

RESUMO

OBJECTIVE: The generalized matrix learning vector quantization (GMLVQ) is used to estimate the relevance of texture features in their ability to classify interstitial lung disease patterns in high-resolution computed tomography images. METHODOLOGY: After a stochastic gradient descent, the GMLVQ algorithm provides a discriminative distance measure of relevance factors, which can account for pairwise correlations between different texture features and their importance for the classification of healthy and diseased patterns. 65 texture features were extracted from gray-level co-occurrence matrices (GLCMs). These features were ranked and selected according to their relevance obtained by GMLVQ and, for comparison, to a mutual information (MI) criteria. The classification performance for different feature subsets was calculated for a k-nearest-neighbor (kNN) and a random forests classifier (RanForest), and support vector machines with a linear and a radial basis function kernel (SVMlin and SVMrbf). RESULTS: For all classifiers, feature sets selected by the relevance ranking assessed by GMLVQ had a significantly better classification performance (p<0.05) for many texture feature sets compared to the MI approach. For kNN, RanForest, and SVMrbf, some of these feature subsets had a significantly better classification performance when compared to the set consisting of all features (p<0.05). CONCLUSION: While this approach estimates the relevance of single features, future considerations of GMLVQ should include the pairwise correlation for the feature ranking, e.g. to reduce the redundancy of two equally relevant features.


Assuntos
Algoritmos , Doenças Pulmonares Intersticiais/classificação , Doenças Pulmonares Intersticiais/diagnóstico , Tomografia Computadorizada por Raios X/métodos , Análise por Conglomerados , Humanos , Máquina de Vetores de Suporte
10.
Neural Netw ; 26: 159-73, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22041220

RESUMO

We present an extension of the recently introduced Generalized Matrix Learning Vector Quantization algorithm. In the original scheme, adaptive square matrices of relevance factors parameterize a discriminative distance measure. We extend the scheme to matrices of limited rank corresponding to low-dimensional representations of the data. This allows to incorporate prior knowledge of the intrinsic dimension and to reduce the number of adaptive parameters efficiently. In particular, for very large dimensional data, the limitation of the rank can reduce computation time and memory requirements significantly. Furthermore, two- or three-dimensional representations constitute an efficient visualization method for labeled data sets. The identification of a suitable projection is not treated as a pre-processing step but as an integral part of the supervised training. Several real world data sets serve as an illustration and demonstrate the usefulness of the suggested method.


Assuntos
Inteligência Artificial , Aprendizagem , Algoritmos , Análise Discriminante , Humanos , Reconhecimento Automatizado de Padrão
11.
IEEE Trans Neural Netw ; 21(5): 831-40, 2010 May.
Artigo em Inglês | MEDLINE | ID: mdl-20236882

RESUMO

In this paper, we present a regularization technique to extend recently proposed matrix learning schemes in learning vector quantization (LVQ). These learning algorithms extend the concept of adaptive distance measures in LVQ to the use of relevance matrices. In general, metric learning can display a tendency towards oversimplification in the course of training. An overly pronounced elimination of dimensions in feature space can have negative effects on the performance and may lead to instabilities in the training. We focus on matrix learning in generalized LVQ (GLVQ). Extending the cost function by an appropriate regularization term prevents the unfavorable behavior and can help to improve the generalization ability. The approach is first tested and illustrated in terms of artificial model data. Furthermore, we apply the scheme to benchmark classification data sets from the UCI Repository of Machine Learning. We demonstrate the usefulness of regularization also in the case of rank limited relevance matrices, i.e., matrix learning with an implicit, low-dimensional representation of the data.


Assuntos
Inteligência Artificial , Retroalimentação , Aprendizagem/fisiologia , Redes Neurais de Computação , Algoritmos , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA