Pesquisa | Portal de Pesquisa da BVS Enfermagem

Identifying gene expression programs in single-cell RNA-seq data using linear correlation explanation.

Nussbaum, Yulia I; Hossain, K S M Tozammel; Kaifi, Jussuf; Warren, Wesley C; Shyu, Chi-Ren; Mitchem, Jonathan B.

J Biomed Inform ; 154: 104644, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38631462

RESUMO

OBJECTIVE: Gene expression analysis through single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of gene regulation in diverse cell types, tissues, and organisms. While existing methods primarily focus on identifying cell type-specific gene expression programs (GEPs), the characterization of GEPs associated with biological processes and stimuli responses remains limited. In this study, we aim to infer biologically meaningful GEPs that are associated with both cellular phenotypes and activity programs directly from scRNA-seq data. METHODS: We applied linear CorEx, a machine-learning-based approach, to infer GEPs by grouping genes based on total correlation optimization function in simulated and real-world scRNA-seq datasets. Additionally, we utilized a transfer learning approach to project CorEx-inferred GEPs to other scRNA-seq datasets. RESULTS: By leveraging total correlation optimization, linear CorEx groups genes and demonstrates superior performance in identifying cell types and activity programs compared to similar methods using simulated data. Furthermore, we apply this same approach to real-world scRNA-seq data from the mouse dentate gyrus and embryonic colon development, uncovering biologically relevant GEPs related to cell types, developmental ages, and cell cycle programs. We also demonstrate the potential for transfer learning by evaluating similar datasets, showcasing the cross-species sensitivity of linear CorEx. CONCLUSION: Our findings validate linear CorEx as a valuable tool for comprehensively analyzing complex signals in scRNA-seq data, leading to deeper insights into gene expression dynamics, cellular heterogeneity, and regulatory mechanisms.

Assuntos

Aprendizado de Máquina , RNA-Seq , Análise de Célula Única , Análise de Célula Única/métodos , Animais , Camundongos , RNA-Seq/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Humanos , Giro Denteado/metabolismo , Algoritmos , Colo/metabolismo , Colo/citologia , Análise da Expressão Gênica de Célula Única

Identifying geopolitical event precursors using attention-based LSTMs.

Hossain, K S M Tozammel; Harutyunyan, Hrayr; Ning, Yue; Kennedy, Brendan; Ramakrishnan, Naren; Galstyan, Aram.

Front Artif Intell ; 5: 893875, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36388399

RESUMO

Forecasting societal events such as civil unrest, mass protests, and violent conflicts is a challenging problem with several important real-world applications in planning and policy making. While traditional forecasting approaches have typically relied on historical time series for generating such forecasts, recent research has focused on using open source surrogate data for more accurate and timely forecasts. Furthermore, leveraging such data can also help to identify precursors of those events that can be used to gain insights into the generated forecasts. The key challenge is to develop a unified framework for forecasting and precursor identification that can deal with missing historical data. Other challenges include sufficient flexibility in handling different types of events and providing interpretable representations of identified precursors. Although existing methods exhibit promising performance for predictive modeling in event detection, these models do not adequately address the above challenges. Here, we propose a unified framework based on an attention-based long short-term memory (LSTM) model to simultaneously forecast events with sequential text datasets as well as identify precursors at different granularity such as documents and document excerpts. The key idea is to leverage word context in sequential and time-stamped documents such as news articles and blogs for learning a rich set of precursors. We validate the proposed framework by conducting extensive experiments with two real-world datasets-military action and violent conflicts in the Middle East and mass protests in Latin America. Our results show that overall, the proposed approach generates more accurate forecasts compared to the existing state-of-the-art methods, while at the same time producing a rich set of precursors for the forecasted events.

Improved multiple sequence alignments using coupled pattern mining.

Hossain, K S M Tozammel; Patnaik, Debprakash; Laxman, Srivatsan; Jain, Prateek; Bailey-Kellogg, Chris; Ramakrishnan, Naren.

IEEE/ACM Trans Comput Biol Bioinform ; 10(5): 1098-112, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24384701

RESUMO

We present alignment refinement by mining coupled residues (ARMiCoRe), a novel approach to a classical bioinformatics problem, viz., multiple sequence alignment (MSA) of gene and protein sequences. Aligning multiple biological sequences is a key step in elucidating evolutionary relationships, annotating newly sequenced segments, and understanding the relationship between biological sequences and functions. Classical MSA algorithms are designed to primarily capture conservations in sequences whereas couplings, or correlated mutations, are well known as an additional important aspect of sequence evolution. (Two sequence positions are coupled when mutations in one are accompanied by compensatory mutations in another). As a result, better exposition of couplings is sometimes one of the reasons for hand-tweaking of MSAs by practitioners. ARMiCoRe introduces a distinctly pattern mining approach to improving MSAs: using frequent episode mining as a foundational basis, we define the notion of a coupled pattern and demonstrate how the discovery and tiling of coupled patterns using a max-flow approach can yield MSAs that are better than conservation-based alignments. Although we were motivated to improve MSAs for the sake of better exposing couplings, we demonstrate that our MSAs are also improvements in terms of traditional metrics of assessment. We demonstrate the effectiveness of ARMiCoRe on a large collection of data sets.

Assuntos

Algoritmos , Mineração de Dados/métodos , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Proteínas/genética , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sequência de Bases , Sequência Conservada/genética , Dados de Sequência Molecular

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA