Búsqueda | Biblioteca Virtual en Salud Odontología. Uruguay

Identifying gene expression programs in single-cell RNA-seq data using linear correlation explanation.

Nussbaum, Yulia I; Hossain, K S M Tozammel; Kaifi, Jussuf; Warren, Wesley C; Shyu, Chi-Ren; Mitchem, Jonathan B.

J Biomed Inform ; 154: 104644, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38631462

RESUMEN

OBJECTIVE: Gene expression analysis through single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of gene regulation in diverse cell types, tissues, and organisms. While existing methods primarily focus on identifying cell type-specific gene expression programs (GEPs), the characterization of GEPs associated with biological processes and stimuli responses remains limited. In this study, we aim to infer biologically meaningful GEPs that are associated with both cellular phenotypes and activity programs directly from scRNA-seq data. METHODS: We applied linear CorEx, a machine-learning-based approach, to infer GEPs by grouping genes based on total correlation optimization function in simulated and real-world scRNA-seq datasets. Additionally, we utilized a transfer learning approach to project CorEx-inferred GEPs to other scRNA-seq datasets. RESULTS: By leveraging total correlation optimization, linear CorEx groups genes and demonstrates superior performance in identifying cell types and activity programs compared to similar methods using simulated data. Furthermore, we apply this same approach to real-world scRNA-seq data from the mouse dentate gyrus and embryonic colon development, uncovering biologically relevant GEPs related to cell types, developmental ages, and cell cycle programs. We also demonstrate the potential for transfer learning by evaluating similar datasets, showcasing the cross-species sensitivity of linear CorEx. CONCLUSION: Our findings validate linear CorEx as a valuable tool for comprehensively analyzing complex signals in scRNA-seq data, leading to deeper insights into gene expression dynamics, cellular heterogeneity, and regulatory mechanisms.

Asunto(s)

Aprendizaje Automático , RNA-Seq , Análisis de Expresión Génica de una Sola Célula , Animales , Humanos , Ratones , Algoritmos , Colon/metabolismo , Colon/citología , Biología Computacional/métodos , Giro Dentado/metabolismo , Perfilación de la Expresión Génica/métodos , RNA-Seq/métodos

Identifying geopolitical event precursors using attention-based LSTMs.

Hossain, K S M Tozammel; Harutyunyan, Hrayr; Ning, Yue; Kennedy, Brendan; Ramakrishnan, Naren; Galstyan, Aram.

Front Artif Intell ; 5: 893875, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36388399

RESUMEN

Forecasting societal events such as civil unrest, mass protests, and violent conflicts is a challenging problem with several important real-world applications in planning and policy making. While traditional forecasting approaches have typically relied on historical time series for generating such forecasts, recent research has focused on using open source surrogate data for more accurate and timely forecasts. Furthermore, leveraging such data can also help to identify precursors of those events that can be used to gain insights into the generated forecasts. The key challenge is to develop a unified framework for forecasting and precursor identification that can deal with missing historical data. Other challenges include sufficient flexibility in handling different types of events and providing interpretable representations of identified precursors. Although existing methods exhibit promising performance for predictive modeling in event detection, these models do not adequately address the above challenges. Here, we propose a unified framework based on an attention-based long short-term memory (LSTM) model to simultaneously forecast events with sequential text datasets as well as identify precursors at different granularity such as documents and document excerpts. The key idea is to leverage word context in sequential and time-stamped documents such as news articles and blogs for learning a rich set of precursors. We validate the proposed framework by conducting extensive experiments with two real-world datasets-military action and violent conflicts in the Middle East and mass protests in Latin America. Our results show that overall, the proposed approach generates more accurate forecasts compared to the existing state-of-the-art methods, while at the same time producing a rich set of precursors for the forecasted events.

Improved multiple sequence alignments using coupled pattern mining.

Hossain, K S M Tozammel; Patnaik, Debprakash; Laxman, Srivatsan; Jain, Prateek; Bailey-Kellogg, Chris; Ramakrishnan, Naren.

IEEE/ACM Trans Comput Biol Bioinform ; 10(5): 1098-112, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24384701

RESUMEN

We present alignment refinement by mining coupled residues (ARMiCoRe), a novel approach to a classical bioinformatics problem, viz., multiple sequence alignment (MSA) of gene and protein sequences. Aligning multiple biological sequences is a key step in elucidating evolutionary relationships, annotating newly sequenced segments, and understanding the relationship between biological sequences and functions. Classical MSA algorithms are designed to primarily capture conservations in sequences whereas couplings, or correlated mutations, are well known as an additional important aspect of sequence evolution. (Two sequence positions are coupled when mutations in one are accompanied by compensatory mutations in another). As a result, better exposition of couplings is sometimes one of the reasons for hand-tweaking of MSAs by practitioners. ARMiCoRe introduces a distinctly pattern mining approach to improving MSAs: using frequent episode mining as a foundational basis, we define the notion of a coupled pattern and demonstrate how the discovery and tiling of coupled patterns using a max-flow approach can yield MSAs that are better than conservation-based alignments. Although we were motivated to improve MSAs for the sake of better exposing couplings, we demonstrate that our MSAs are also improvements in terms of traditional metrics of assessment. We demonstrate the effectiveness of ARMiCoRe on a large collection of data sets.

Asunto(s)

Algoritmos , Minería de Datos/métodos , Reconocimiento de Normas Patrones Automatizadas/métodos , Proteínas/química , Proteínas/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos , Secuencia de Bases , Secuencia Conservada/genética , Datos de Secuencia Molecular

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA