Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Methods ; 132: 34-41, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-28684340

RESUMEN

Can we use graph mining algorithms to find patterns in tumor molecular mechanisms? Can we model disease progression with multiple time-specific graph comparison algorithms? In this paper, we will focus on this area. Our main contributions are 1) we proposed the Temporal-Omics (Temp-O) workflow to model tumor progression in non-small cell lung cancer (NSCLC) using graph comparisons between multiple stage-specific graphs, and 2) we showed that temporal structures are meaningful in the tumor progression of NSCLC. Other identified temporal structures that were not highlighted in this paper may also be used to gain insights to possible novel mechanisms. Importantly, the Temp-O workflow is generic; while we applied it on NSCLC, it can be applied in other cancers and diseases. We used gene expression data from tumor samples across disease stages to model lung cancer progression, creating stage-specific tumor graphs. Validating our findings in independent datasets showed that differences in temporal network structures capture diverse mechanisms in NSCLC. Furthermore, results showed that structures are consistent and potentially biologically important as we observed that genes with similar protein names were captured in the same cliques for all cliques in all datasets. Importantly, the identified temporal structures are meaningful in the tumor progression of NSCLC as they agree with the molecular mechanism in the tumor progression or carcinogenesis of NSCLC. In particular, the identified major histocompatibility complex of class II temporal structures capture mechanisms concerning carcinogenesis; the proteasome temporal structures capture mechanisms that are in early or late stages of lung cancer; the ribosomal cliques capture the role of ribosome biosynthesis in cancer development and sustainment. Further, on a large independent dataset we validated that temporal network structures identified proteins that are prognostic for overall survival in NSCLC adenocarcinoma.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas/patología , Neoplasias Pulmonares/patología , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Carcinoma de Pulmón de Células no Pequeñas/genética , Carcinoma de Pulmón de Células no Pequeñas/metabolismo , Carcinoma de Pulmón de Células no Pequeñas/mortalidad , Progresión de la Enfermedad , Redes Reguladoras de Genes , Humanos , Estimación de Kaplan-Meier , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/mortalidad , Modelos Biológicos , Anotación de Secuencia Molecular , Transcriptoma
2.
J Integr Neurosci ; 15(3): 381-402, 2016 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-27774837

RESUMEN

We propose a nonlinear dynamic model for an invasive electroencephalogram analysis that learns the optimal parameters of the neural population model via the Levenberg-Marquardt algorithm. We introduce the crucial windows where the estimated parameters present patterns before seizure onset. The optimal parameters minimizes the error between the observed signal and the generated signal by the model. The proposed approach effectively discriminates between healthy signals and epileptic seizure signals. We evaluate the proposed method using an electroencephalogram dataset with normal and epileptic seizure sequences. The empirical results show that the patterns of parameters as a seizure approach and the method is efficient in analyzing nonlinear epilepsy electroencephalogram data. The accuracy of estimating the optimal parameters is improved by using the nonlinear dynamic model.


Asunto(s)
Encéfalo/diagnóstico por imagen , Electroencefalografía/métodos , Epilepsia/diagnóstico por imagen , Dinámicas no Lineales , Reconocimiento de Normas Patrones Automatizadas/métodos , Procesamiento de Señales Asistido por Computador , Algoritmos , Encéfalo/fisiopatología , Encéfalo/cirugía , Conjuntos de Datos como Asunto , Electrodos Implantados , Epilepsia/fisiopatología , Epilepsia/cirugía , Humanos , Convulsiones/diagnóstico por imagen , Convulsiones/fisiopatología , Convulsiones/cirugía
3.
Big Data ; 4(3): 179-91, 2016 09.
Artículo en Inglés | MEDLINE | ID: mdl-27642720

RESUMEN

Multiaspect data are ubiquitous in modern Big Data applications. For instance, different aspects of a social network are the different types of communication between people, the time stamp of each interaction, and the location associated to each individual. How can we jointly model all those aspects and leverage the additional information that they introduce to our analysis? Tensors, which are multidimensional extensions of matrices, are a principled and mathematically sound way of modeling such multiaspect data. In this article, our goal is to popularize tensors and tensor decompositions to Big Data practitioners by demonstrating their effectiveness, outlining challenges that pertain to their application in Big Data scenarios, and presenting our recent work that tackles those challenges. We view this work as a step toward a fully automated, unsupervised tensor mining tool that can be easily and broadly adopted by practitioners in academia and industry.


Asunto(s)
Minería de Datos , Simulación por Computador , Aprendizaje Automático
4.
Stat Anal Data Min ; 9(4): 269-290, 2016 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-27672406

RESUMEN

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, 'friends', wall-postings); there, Turbo-SMT spots spammer-like anomalies.

5.
PLoS One ; 11(3): e0151027, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26974560

RESUMEN

Complex networks have been shown to exhibit universal properties, with one of the most consistent patterns being the scale-free degree distribution, but are there regularities obeyed by the r-hop neighborhood in real networks? We answer this question by identifying another power-law pattern that describes the relationship between the fractions of node pairs C(r) within r hops and the hop count r. This scale-free distribution is pervasive and describes a large variety of networks, ranging from social and urban to technological and biological networks. In particular, inspired by the definition of the fractal correlation dimension D2 on a point-set, we consider the hop-count r to be the underlying distance metric between two vertices of the network, and we examine the scaling of C(r) with r. We find that this relationship follows a power-law in real networks within the range 2 ≤ r ≤ d, where d is the effective diameter of the network, that is, the 90-th percentile distance. We term this relationship as power-hop and the corresponding power-law exponent as power-hop exponent h. We provide theoretical justification for this pattern under successful existing network models, while we analyze a large set of real and synthetic network datasets and we show the pervasiveness of the power-hop.


Asunto(s)
Modelos Teóricos , Apoyo Social , Humanos
6.
J R Soc Interface ; 11(96): 20140283, 2014 Jul 06.
Artículo en Inglés | MEDLINE | ID: mdl-24789562

RESUMEN

Network robustness is an important principle in biology and engineering. Previous studies of global networks have identified both redundancy and sparseness as topological properties used by robust networks. By focusing on molecular subnetworks, or modules, we show that module topology is tightly linked to the level of environmental variability (noise) the module expects to encounter. Modules internal to the cell that are less exposed to environmental noise are more connected and less robust than external modules. A similar design principle is used by several other biological networks. We propose a simple change to the evolutionary gene duplication model which gives rise to the rich range of module topologies observed within real networks. We apply these observations to evaluate and design communication networks that are specifically optimized for noisy or malicious environments. Combined, joint analysis of biological and computational networks leads to novel algorithms and insights benefiting both fields.


Asunto(s)
Redes de Comunicación de Computadores , Saccharomyces cerevisiae/genética , Duplicación de Gen , Redes Reguladoras de Genes , Saccharomyces cerevisiae/metabolismo , Transducción de Señal , Biología de Sistemas
7.
Big Data ; 2(4): 216-29, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-27442756

RESUMEN

Given a simple noun such as apple, and a question such as "Is it edible?," what processes take place in the human brain? More specifically, given the stimulus, what are the interactions between (groups of) neurons (also known as functional connectivity) and how can we automatically infer those interactions, given measurements of the brain activity? Furthermore, how does this connectivity differ across different human subjects? In this work, we show that this problem, even though originating from the field of neuroscience, can benefit from big data techniques; we present a simple, novel good-enough brain model, or GeBM in short, and a novel algorithm Sparse-SysId, which are able to effectively model the dynamics of the neuron interactions and infer the functional connectivity. Moreover, GeBM is able to simulate basic psychological phenomena such as habituation and priming (whose definition we provide in the main text). We evaluate GeBM by using real brain data. GeBM produces brain activity patterns that are strikingly similar to the real ones, where the inferred functional connectivity is able to provide neuroscientific insights toward a better understanding of the way that neurons interact with each other, as well as detect regularities and outliers in multisubject brain activity measurements.

8.
Proc SIAM Int Conf Data Min ; 2014: 118-126, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-26473087

RESUMEN

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we accelerate any CMTF solver, so that it runs within a few minutes instead of tens of hours to a day, while maintaining good accuracy? We introduce TURBO-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, by up to 200×, along with an up to 65 fold increase in sparsity, with comparable accuracy to the baseline. We apply TURBO-SMT to BRAINQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. TURBO-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy.

9.
IEEE Trans Cybern ; 44(1): 54-65, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23757533

RESUMEN

Let us consider that someone is starting a research on a topic that is unfamiliar to them. Which seminal papers have influenced the topic the most? What is the genealogy of the seminal papers in this topic? These are the questions that they can raise, which we try to answer in this paper. First, we propose an algorithm that finds a set of seminal papers on a given topic. We also address the performance and scalability issues of this sophisticated algorithm. Next, we discuss the measures to decide how much a paper is influenced by another paper. Then, we propose an algorithm that constructs a genealogy of the seminal papers by using the influence measure and citation information. Finally, through extensive experiments with a large volume of a real-world academic literature data, we show the effectiveness and efficiency of our approach.

10.
Comput Math Methods Med ; 2013: 545613, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23710252

RESUMEN

Recently, data with complex characteristics such as epilepsy electroencephalography (EEG) time series has emerged. Epilepsy EEG data has special characteristics including nonlinearity, nonnormality, and nonperiodicity. Therefore, it is important to find a suitable forecasting method that covers these special characteristics. In this paper, we propose a coercively adjusted autoregression (CA-AR) method that forecasts future values from a multivariable epilepsy EEG time series. We use the technique of random coefficients, which forcefully adjusts the coefficients with -1 and 1. The fractal dimension is used to determine the order of the CA-AR model. We applied the CA-AR method reflecting special characteristics of data to forecast the future value of epilepsy EEG data. Experimental results show that when compared to previous methods, the proposed method can forecast faster and accurately.


Asunto(s)
Diagnóstico por Computador/estadística & datos numéricos , Electroencefalografía/estadística & datos numéricos , Epilepsia/diagnóstico , Modelos Neurológicos , Biología Computacional , Humanos , Análisis de Regresión
11.
Neuroimage ; 58(2): 537-48, 2011 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-21729758

RESUMEN

The traditional approach to functional image analysis models images as matrices of raw voxel intensity values. Although such a representation is widely utilized and heavily entrenched both within neuroimaging and in the wider data mining community, the strong interactions among space, time, and categorical modes such as subject and experimental task inherent in functional imaging yield a dataset with "high-order" structure, which matrix models are incapable of exploiting. Reasoning across all of these modes of data concurrently requires a high-order model capable of representing relationships between all modes of the data in tandem. We thus propose to model functional MRI data using tensors, which are high-order generalizations of matrices equivalent to multidimensional arrays or data cubes. However, several unique challenges exist in the high-order analysis of functional medical data: naïve tensor models are incapable of exploiting spatiotemporal locality patterns, standard tensor analysis techniques exhibit poor efficiency, and mixtures of numeric and categorical modes of data are very often present in neuroimaging experiments. Formulating the problem of image clustering as a form of Latent Semantic Analysis and using the WaveCluster algorithm as a baseline, we propose a comprehensive hybrid tensor and wavelet framework for clustering, concept discovery, and compression of functional medical images which successfully addresses these challenges. Our approach reduced runtime and dataset size on a 9.3GB finger opposition motor task fMRI dataset by up to 98% while exhibiting improved spatiotemporal coherence relative to standard tensor, wavelet, and voxel-based approaches. Our clustering technique was capable of automatically differentiating between the frontal areas of the brain responsible for task-related habituation and the motor regions responsible for executing the motor task, in contrast to a widely used fMRI analysis program, SPM, which only detected the latter region. Furthermore, our approach discovered latent concepts suggestive of subject handedness nearly 100× faster than standard approaches. These results suggest that a high-order model is an integral component to accurate scalable functional neuroimaging.


Asunto(s)
Procesamiento de Imagen Asistida por Computador/métodos , Imagen por Resonancia Magnética/métodos , Adulto , Algoritmos , Análisis por Conglomerados , Interpretación Estadística de Datos , Minería de Datos , Imagen de Difusión Tensora , Análisis Factorial , Femenino , Lógica Difusa , Humanos , Procesamiento de Imagen Asistida por Computador/estadística & datos numéricos , Imagen por Resonancia Magnética/estadística & datos numéricos , Masculino , Modelos Estadísticos , Análisis de Componente Principal , Análisis de Ondículas
12.
Bioinformatics ; 26(12): i47-56, 2010 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-20529936

RESUMEN

MOTIVATION: Microarray profiling of mRNA abundance is often ill suited for temporal-spatial analysis of gene expressions in multicellular organisms such as Drosophila. Recent progress in image-based genome-scale profiling of whole-body mRNA patterns via in situ hybridization (ISH) calls for development of accurate and automatic image analysis systems to facilitate efficient mining of complex temporal-spatial mRNA patterns, which will be essential for functional genomics and network inference in higher organisms. RESULTS: We present SPEX(2), an automatic system for embryonic ISH image processing, which can extract, transform, compare, classify and cluster spatial gene expression patterns in Drosophila embryos. Our pipeline for gene expression pattern extraction outputs the precise spatial locations and strengths of the gene expression. We performed experiments on the largest publicly available collection of Drosophila ISH images, and show that our method achieves excellent performance in automatic image annotation, and also finds clusters that are significantly enriched, both for gene ontology functional annotations, and for annotation terms from a controlled vocabulary used by human curators to describe these images. AVAILABILITY: Software will be available at http://www.sailing.cs.cmu.edu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Drosophila/embriología , Drosophila/genética , Expresión Génica , Procesamiento de Imagen Asistida por Computador/métodos , Hibridación in Situ/métodos , ARN Mensajero/análisis , Programas Informáticos , Animales , Perfilación de la Expresión Génica/métodos , ARN Mensajero/metabolismo
13.
Bioinformatics ; 24(13): i250-8, 2008 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-18586722

RESUMEN

MOTIVATION: Protein complexes integrate multiple gene products to coordinate many biological functions. Given a graph representing pairwise protein interaction data one can search for subgraphs representing protein complexes. Previous methods for performing such search relied on the assumption that complexes form a clique in that graph. While this assumption is true for some complexes, it does not hold for many others. New algorithms are required in order to recover complexes with other types of topological structure. RESULTS: We present an algorithm for inferring protein complexes from weighted interaction graphs. By using graph topological patterns and biological properties as features, we model each complex subgraph by a probabilistic Bayesian network (BN). We use a training set of known complexes to learn the parameters of this BN model. The log-likelihood ratio derived from the BN is then used to score subgraphs in the protein interaction graph and identify new complexes. We applied our method to protein interaction data in yeast. As we show our algorithm achieved a considerable improvement over clique based algorithms in terms of its ability to recover known complexes. We discuss some of the new complexes predicted by our algorithm and determine that they likely represent true complexes. AVAILABILITY: Matlab implementation is available on the supporting website: www.cs.cmu.edu/~qyj/SuperComplex.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Modelos Biológicos , Mapeo de Interacción de Proteínas/métodos , Proteoma/metabolismo , Transducción de Señal/fisiología , Simulación por Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA