Pesquisa | BVS Doenças Infecciosas e Parasitárias

Interpretable online network dictionary learning for inferring long-range chromatin interactions.

Rana, Vishal; Peng, Jianhao; Pan, Chao; Lyu, Hanbaek; Cheng, Albert; Kim, Minji; Milenkovic, Olgica.

PLoS Comput Biol ; 20(5): e1012095, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38753877

RESUMO

Dictionary learning (DL), implemented via matrix factorization (MF), is commonly used in computational biology to tackle ubiquitous clustering problems. The method is favored due to its conceptual simplicity and relatively low computational complexity. However, DL algorithms produce results that lack interpretability in terms of real biological data. Additionally, they are not optimized for graph-structured data and hence often fail to handle them in a scalable manner. In order to address these limitations, we propose a novel DL algorithm called online convex network dictionary learning (online cvxNDL). Unlike classical DL algorithms, online cvxNDL is implemented via MF and designed to handle extremely large datasets by virtue of its online nature. Importantly, it enables the interpretation of dictionary elements, which serve as cluster representatives, through convex combinations of real measurements. Moreover, the algorithm can be applied to data with a network structure by incorporating specialized subnetwork sampling techniques. To demonstrate the utility of our approach, we apply cvxNDL on 3D-genome RNAPII ChIA-Drop data with the goal of identifying important long-range interaction patterns (long-range dictionary elements). ChIA-Drop probes higher-order interactions, and produces data in the form of hypergraphs whose nodes represent genomic fragments. The hyperedges represent observed physical contacts. Our hypergraph model analysis has the objective of creating an interpretable dictionary of long-range interaction patterns that accurately represent global chromatin physical contact maps. Through the use of dictionary information, one can also associate the contact maps with RNA transcripts and infer cellular functions. To accomplish the task at hand, we focus on RNAPII-enriched ChIA-Drop data from Drosophila Melanogaster S2 cell lines. Our results offer two key insights. First, we demonstrate that online cvxNDL retains the accuracy of classical DL (MF) methods while simultaneously ensuring unique interpretability and scalability. Second, we identify distinct collections of proximal and distal interaction patterns involving chromatin elements shared by related processes across different chromosomes, as well as patterns unique to specific chromosomes. To associate the dictionary elements with biological properties of the corresponding chromatin regions, we employ Gene Ontology (GO) enrichment analysis and perform multiple RNA coexpression studies.

Assuntos

Algoritmos , Cromatina , Biologia Computacional , Drosophila melanogaster , Cromatina/genética , Cromatina/química , Cromatina/metabolismo , Biologia Computacional/métodos , Drosophila melanogaster/genética , Animais , Aprendizado de Máquina

Learning low-rank latent mesoscale structures in networks.

Lyu, Hanbaek; Kureh, Yacoub H; Vendrow, Joshua; Porter, Mason A.

Nat Commun ; 15(1): 224, 2024 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-38172092

RESUMO

Researchers in many fields use networks to represent interactions between entities in complex systems. To study the large-scale behavior of complex systems, it is useful to examine mesoscale structures in networks as building blocks that influence such behavior. In this paper, we present an approach to describe low-rank mesoscale structures in networks. We find that many real-world networks possess a small set of latent motifs that effectively approximate most subgraphs at a fixed mesoscale. Such low-rank mesoscale structures allow one to reconstruct networks by approximating subgraphs of a network using combinations of latent motifs. Employing subgraph sampling and nonnegative matrix factorization enables the discovery of these latent motifs. The ability to encode and reconstruct networks using a small set of latent motifs has many applications in network analysis, including network comparison, network denoising, and edge inference.

Learning to predict synchronization of coupled oscillators on randomly generated graphs.

Bassi, Hardeep; Yim, Richard P; Vendrow, Joshua; Koduluka, Rohith; Zhu, Cherlin; Lyu, Hanbaek.

Sci Rep ; 12(1): 15056, 2022 09 05.

Artigo em Inglês | MEDLINE | ID: mdl-36065054

RESUMO

Suppose we are given a system of coupled oscillators on an unknown graph along with the trajectory of the system during some period. Can we predict whether the system will eventually synchronize? Even with a known underlying graph structure, this is an important yet analytically intractable question in general. In this work, we take an alternative approach to the synchronization prediction problem by viewing it as a classification problem based on the fact that any given system will eventually synchronize or converge to a non-synchronizing limit cycle. By only using some basic statistics of the underlying graphs such as edge density and diameter, our method can achieve perfect accuracy when there is a significant difference in the topology of the underlying graphs between the synchronizing and the non-synchronizing examples. However, in the problem setting where these graph statistics cannot distinguish the two classes very well (e.g., when the graphs are generated from the same random graph model), we find that pairing a few iterations of the initial dynamics along with the graph statistics as the input to our classification algorithms can lead to significant improvement in accuracy; far exceeding what is known by the classical oscillator theory. More surprisingly, we find that in almost all such settings, dropping out the basic graph statistics and training our algorithms with only initial dynamics achieves nearly the same accuracy. We demonstrate our method on three models of continuous and discrete coupled oscillators-the Kuramoto model, Firefly Cellular Automata, and Greenberg-Hastings model. Finally, we also propose an "ensemble prediction" algorithm that successfully scales our method to large graphs by training on dynamics observed from multiple random subgraphs.

Assuntos

Algoritmos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA