Pesquisa | Biblioteca Virtual em Saúde

1.

Identification of Potential Biomarkers Using Integrative Approach: A Case Study of ESCC.

Saikia, Manaswita; Bhattacharyya, Dhruba K; Kalita, Jugal K.

SN Comput Sci ; 4(2): 114, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36573207

RESUMO

This paper presents a consensus-based approach that incorporates three microarray and three RNA-Seq methods for unbiased and integrative identification of differentially expressed genes (DEGs) as potential biomarkers for critical disease(s). The proposed method performs satisfactorily on two microarray datasets (GSE20347 and GSE23400) and one RNA-Seq dataset (GSE130078) for esophageal squamous cell carcinoma (ESCC). Based on the input dataset, our framework employs specific DE methods to detect DEGs independently. A consensus based function that first considers DEGs common to all three methods for further downstream analysis has been introduced. The consensus function employs other parameters to overcome information loss. Differential co-expression (DCE) and preservation analysis of DEGs facilitates the study of behavioral changes in interactions among DEGs under normal and diseased circumstances. Considering hub genes in biologically relevant modules and most GO and pathway enriched DEGs as candidates for potential biomarkers of ESCC, we perform further validation through biological analysis as well as literature evidence. We have identified 25 DEGs that have strong biological relevance to their respective datasets and have previous literature establishing them as potential biomarkers for ESCC. We have further identified 8 additional DEGs as probable potential biomarkers for ESCC, but recommend further in-depth analysis.

2.

DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning.

Kakati, Tulika; Bhattacharyya, Dhruba K; Kalita, Jugal K; Norden-Krichmar, Trina M.

BMC Bioinformatics ; 23(1): 17, 2022 Jan 06.

Artigo em Inglês | MEDLINE | ID: mdl-34991439

RESUMO

BACKGROUND: A limitation of traditional differential expression analysis on small datasets involves the possibility of false positives and false negatives due to sample variation. Considering the recent advances in deep learning (DL) based models, we wanted to expand the state-of-the-art in disease biomarker prediction from RNA-seq data using DL. However, application of DL to RNA-seq data is challenging due to absence of appropriate labels and smaller sample size as compared to number of genes. Deep learning coupled with transfer learning can improve prediction performance on novel data by incorporating patterns learned from other related data. With the emergence of new disease datasets, biomarker prediction would be facilitated by having a generalized model that can transfer the knowledge of trained feature maps to the new dataset. To the best of our knowledge, there is no Convolutional Neural Network (CNN)-based model coupled with transfer learning to predict the significant upregulating (UR) and downregulating (DR) genes from both trained and untrained datasets. RESULTS: We implemented a CNN model, DEGnext, to predict UR and DR genes from gene expression data obtained from The Cancer Genome Atlas database. DEGnext uses biologically validated data along with logarithmic fold change values to classify differentially expressed genes (DEGs) as UR and DR genes. We applied transfer learning to our model to leverage the knowledge of trained feature maps to untrained cancer datasets. DEGnext's results were competitive (ROC scores between 88 and 99[Formula: see text]) with those of five traditional machine learning methods: Decision Tree, K-Nearest Neighbors, Random Forest, Support Vector Machine, and XGBoost. DEGnext was robust and effective in terms of transferring learned feature maps to facilitate classification of unseen datasets. Additionally, we validated that the predicted DEGs from DEGnext were mapped to significant Gene Ontology terms and pathways related to cancer. CONCLUSIONS: DEGnext can classify DEGs into UR and DR genes from RNA-seq cancer datasets with high performance. This type of analysis, using biologically relevant fine-tuning data, may aid in the exploration of potential biomarkers and can be adapted for other disease datasets.

Assuntos

Neoplasias , Redes Neurais de Computação , Humanos , Aprendizado de Máquina , RNA-Seq , Máquina de Vetores de Suporte

3.

A Survey of the Usages of Deep Learning for Natural Language Processing.

Otter, Daniel W; Medina, Julian R; Kalita, Jugal K.

IEEE Trans Neural Netw Learn Syst ; 32(2): 604-624, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-32324570

RESUMO

Over the last several years, the field of natural language processing has been propelled forward by an explosion in the use of deep learning models. This article provides a brief introduction to the field and a quick overview of deep learning architectures and methods. It then sifts through the plethora of recent studies and summarizes a large assortment of relevant contributions. Analyzed research areas include several core linguistic processing issues in addition to many applications of computational linguistics. A discussion of the current state of the art is then provided along with recommendations for future research in the field.

Assuntos

Aprendizado Profundo , Processamento de Linguagem Natural , Sistemas Computacionais , Humanos , Linguística , Redes Neurais de Computação , Inquéritos e Questionários

4.

Prioritizing disease biomarkers using functional module based network analysis: A multilayer consensus driven scheme.

Jha, Monica; Roy, Swarup; Kalita, Jugal K.

Comput Biol Med ; 126: 104023, 2020 11.

Artigo em Inglês | MEDLINE | ID: mdl-33049478

RESUMO

Many complex diseases occur due to genetic factors. A perturbation in the pathway of gene interactions leads to such disorders. Even though a group of genes is responsible, a few significant genes act as a biomarker for disease, perturbing the healthy network. Identifying such marker genes or a set of genes that play a pivotal role in diseases helps drug prioritization. We propose a scheme for finding potential bio-markers using a multi-layer consensus-driven approach. We reconstruct a functional module guided disease sub-network, followed by a multi-step consensus of network inference methods and shared ontological terms. We perform centrality analysis on the sub-networks under consideration and report hub genes as potentially key players in the target disease. To establish our scheme's effectiveness, we use Alzheimer's Disease (AD) and Breast Cancer as candidate diseases for experimentation. We evaluate the significance of prioritized genes based on reported evidence. We observe that BRCA1, BRCA2, and PTEN are the essential genes for Breast Cancer, whereas MAPK1, APP, and CASP7 are the essential genes playing an important role during AD.

Assuntos

Doença de Alzheimer , Redes Reguladoras de Genes , Doença de Alzheimer/genética , Biomarcadores , Consenso , Perfilação da Expressão Gênica , Redes Reguladoras de Genes/genética , Humanos

5.

X-Module: A novel fusion measure to associate co-expressed gene modules from condition-specific expression profiles.

Kakati, Tulika; Bhattacharyya, Dhruba K; Kalita, Jugal K.

J Biosci ; 452020.

Artigo em Inglês | MEDLINE | ID: mdl-32098912

RESUMO

A gene co-expression network (CEN) is of biological interest, since co-expressed genes share common functions and biological processes or pathways. Finding relationships among modules can reveal inter-modular preservation, and similarity in transcriptome, functional, and biological behaviors among modules of the same or two different datasets. There is no method which explores the one-to-one relationships and one-to-many relationships among modules extracted from control and disease samples based on both topological and semantic similarity using both microarray and RNA seq data. In this work, we propose a novel fusion measure to detect mapping between modules from two sets of co-expressed modules extracted from control and disease stages of Alzheimer's disease (AD) and Parkinson's disease (PD) datasets. Our measure considers both topological and biological information of a module and is an estimation of four parameters, namely, semantic similarity, eigengene correlation, degree difference, and the number of common genes. We analyze the consensus modules shared between both control and disease stages in terms of their association with diseases. We also validate the close associations between human and chimpanzee modules and compare with the state-ofthe- art method. Additionally, we propose two novel observations on the relationships between modules for further analysis.

Assuntos

Regulação da Expressão Gênica , Redes Reguladoras de Genes/fisiologia , Transcriptoma , Algoritmos , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Animais , Bases de Dados Genéticas , Humanos , Pan troglodytes , Doença de Parkinson/genética , Doença de Parkinson/metabolismo

6.

Comparison of Methods for Differential Co-expression Analysis for Disease Biomarker Prediction.

Kakati, Tulika; Bhattacharyya, Dhruba K; Barah, Pankaj; Kalita, Jugal K.

Comput Biol Med ; 113: 103380, 2019 10.

Artigo em Inglês | MEDLINE | ID: mdl-31415946

RESUMO

In the recent past, a number of methods have been developed for analysis of biological data. Among these methods, gene co-expression networks have the ability to mine functionally related genes with similar co-expression patterns, because of which such networks have been most widely used. However, gene co-expression networks cannot identify genes, which undergo condition specific changes in their relationships with other genes. In contrast, differential co-expression analysis enables finding co-expressed genes exhibiting significant changes across disease conditions. In this paper, we present some significant outcomes of a comparative study of four co-expression network module detection techniques, namely, THD-Module Extractor, DiffCoEx, MODA, and WGCNA, which can perform differential co-expression analysis on both gene and miRNA expression data (microarray and RNA-seq) and discuss the applications to Alzheimer's disease and Parkinson's disease research. Our observations reveal that compared to other methods, THD-Module Extractor is the most effective in finding modules with higher functional relevance and biological significance.

Assuntos

Doença de Alzheimer , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Doença de Parkinson , Transcriptoma , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Biomarcadores/metabolismo , Humanos , Doença de Parkinson/genética , Doença de Parkinson/metabolismo

7.

Intrinsic-overlapping co-expression module detection with application to Alzheimer's Disease.

Manners, Hazel Nicolette; Roy, Swarup; Kalita, Jugal K.

Comput Biol Chem ; 77: 373-389, 2018 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-30466046

RESUMO

Genes interact with each other and may cause perturbation in the molecular pathways leading to complex diseases. Often, instead of any single gene, a subset of genes interact, forming a network, to share common biological functions. Such a subnetwork is called a functional module or motif. Identifying such modules and central key genes in them, that may be responsible for a disease, may help design patient-specific drugs. In this study, we consider the neurodegenerative Alzheimer's Disease (AD) and identify potentially responsible genes from functional motif analysis. We start from the hypothesis that central genes in genetic modules are more relevant to a disease that is under investigation and identify hub genes from the modules as potential marker genes. Motifs or modules are often non-exclusive or overlapping in nature. Moreover, they sometimes show intrinsic or hierarchical distributions with overlapping functional roles. To the best of our knowledge, no prior work handles both the situations in an integrated way. We propose a non-exclusive clustering approach, CluViaN (Clustering Via Network) that can detect intrinsic as well as overlapping modules from gene co-expression networks constructed using microarray expression profiles. We compare our method with existing methods to evaluate the quality of modules extracted. CluViaN reports the presence of intrinsic and overlapping motifs in different species not reported by any other research. We further apply our method to extract significant AD specific modules using CluViaN and rank them based the number of genes from a module involved in the disease pathways. Finally, top central genes are identified by topological analysis of the modules. We use two different AD phenotype data for experimentation. We observe that central genes, namely PSEN1, APP, NDUFB2, NDUFA1, UQCR10, PPP3R1 and a few more, play significant roles in the AD. Interestingly, our experiments also find a hub gene, PML, which has recently been reported to play a role in plasticity, circadian rhythms and the response to proteins which can cause neurodegenerative disorders. MUC4, another hub gene that we find experimentally is yet to be investigated for its potential role in AD. A software implementation of CluViaN in Java is available for download at https://sites.google.com/site/swarupnehu/publications/resources/CluViaN Software.rar.

Assuntos

Doença de Alzheimer/genética , Redes Reguladoras de Genes , Algoritmos , Análise por Conglomerados , Regulação da Expressão Gênica , Genômica/métodos , Humanos , Fenótipo , Transcriptoma

8.

THD-Tricluster: A robust triclustering technique and its application in condition specific change analysis in HIV-1 progression data.

Kakati, Tulika; Ahmed, Hasin A; Bhattacharyya, Dhruba K; Kalita, Jugal K.

Comput Biol Chem ; 75: 154-167, 2018 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-29787933

RESUMO

Developing a cost-effective and robust triclustering algorithm that can identify triclusters of high biological significance in the gene-sample-time (GST) domain is a challenging task. Most existing triclustering algorithms can detect shifting and scaling patterns in isolation, they are not able to handle co-occurring shifting-and-scaling patterns. This paper makes an attempt to address this issue. It introduces a robust triclustering algorithm called THD-Tricluster to identify triclusters over the GST domain. In addition to applying over several benchmark datasets for its validation, the proposed THD-Tricluster algorithm was applied on HIV-1 progression data to identify disease-specific genes. THD-Tricluster could identify 38 most responsible genes for the deadly disease which includes GATA3, EGR1, JUN, ELF1, AGFG1, AGFG2, CX3CR1, CXCL12, CCR5, CCR2, and many others. The results are validated using GeneCard and other established results.

Assuntos

Algoritmos , HIV-1/genética , Análise por Conglomerados , HIV-1/isolamento & purificação , Humanos , Análise de Sequência com Séries de Oligonucleotídeos

9.

Protein complex finding and ranking: An application to Alzheimer's disease.

Sharma, Pooja; Bhattacharyya, Dhruba K; Kalita, Jugal K.

J Biosci ; 42(3): 383-396, 2017 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-29358552

RESUMO

Protein complexes are known to play a major role in controlling cellular activity in a living being. Identifying complexes from raw protein-protein interactions (PPIs) is an important area of research. Earlier work has been limited mostly to yeast and a few other model organisms. Such protein complex identification methods, when applied to large human PPIs often give poor performance. We introduce a novel method called ComFiR to detect such protein complexes and further rank diseased complexes based on a query disease. We have shown that it has better performance in identifying protein complexes from human PPI data. This method is evaluated in terms of positive predictive value, sensitivity and accuracy. We have introduced a ranking approach and showed its application on Alzheimer's disease.

Assuntos

Algoritmos , Doença de Alzheimer/metabolismo , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Doença de Alzheimer/diagnóstico , Doença de Alzheimer/patologia , Bases de Dados de Proteínas , Humanos , Ligação Proteica

10.

Analysis of Gene Expression Patterns Using Biclustering.

Roy, Swarup; Bhattacharyya, Dhruba K; Kalita, Jugal K.

Methods Mol Biol ; 1375: 91-103, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26350227

RESUMO

Mining microarray data to unearth interesting expression profile patterns for discovery of in silico biological knowledge is an emerging area of research in computational biology. A group of functionally related genes may have similar expression patterns under a set of conditions or at some time points. Biclustering is an important data mining tool that has been successfully used to analyze gene expression data for biologically significant cluster discovery. The purpose of this chapter is to introduce interesting patterns that may be observed in expression data and discuss the role of biclustering techniques in detecting interesting functional gene groups with similar expression patterns.

Assuntos

Análise por Conglomerados , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Animais , Mineração de Dados/métodos , Regulação da Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reprodutibilidade dos Testes

11.

Core and peripheral connectivity based cluster analysis over PPI network.

Ahmed, Hasin A; Bhattacharyya, Dhruba K; Kalita, Jugal K.

Comput Biol Chem ; 59 Pt B: 32-41, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-26362299

RESUMO

A number of methods have been proposed in the literature of protein-protein interaction (PPI) network analysis for detection of clusters in the network. Clusters are identified by these methods using various graph theoretic criteria. Most of these methods have been found time consuming due to involvement of preprocessing and post processing tasks. In addition, they do not achieve high precision and recall consistently and simultaneously. Moreover, the existing methods do not employ the idea of core-periphery structural pattern of protein complexes effectively to extract clusters. In this paper, we introduce a clustering method named CPCA based on a recent observation by researchers that a protein complex in a PPI network is arranged as a relatively dense core region and additional proteins weakly connected to the core. CPCA uses two connectivity criterion functions to identify core and peripheral regions of the cluster. To locate initial node of a cluster we introduce a measure called DNQ (Degree based Neighborhood Qualification) index that evaluates tendency of the node to be part of a cluster. CPCA performs well when compared with well-known counterparts. Along with protein complex gold standards, a co-localization dataset has also been used for validation of the results.

Assuntos

Mapas de Interação de Proteínas , Proteínas/química , Análise por Conglomerados , Bases de Dados de Proteínas , Ligação Proteica , Mapeamento de Interação de Proteínas , Reprodutibilidade dos Testes

12.

Reconstruction of gene co-expression network from microarray data using local expression patterns.

Roy, Swarup; Bhattacharyya, Dhruba K; Kalita, Jugal K.

BMC Bioinformatics ; 15 Suppl 7: S10, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25079873

RESUMO

BACKGROUND: Biological networks connect genes, gene products to one another. A network of co-regulated genes may form gene clusters that can encode proteins and take part in common biological processes. A gene co-expression network describes inter-relationships among genes. Existing techniques generally depend on proximity measures based on global similarity to draw the relationship between genes. It has been observed that expression profiles are sharing local similarity rather than global similarity. We propose an expression pattern based method called GeCON to extract Gene CO-expression Network from microarray data. Pair-wise supports are computed for each pair of genes based on changing tendencies and regulation patterns of the gene expression. Gene pairs showing negative or positive co-regulation under a given number of conditions are used to construct such gene co-expression network. We construct co-expression network with signed edges to reflect up- and down-regulation between pairs of genes. Most existing techniques do not emphasize computational efficiency. We exploit a fast correlogram matrix based technique for capturing the support of each gene pair to construct the network. RESULTS: We apply GeCON to both real and synthetic gene expression data. We compare our results using the DREAM (Dialogue for Reverse Engineering Assessments and Methods) Challenge data with three well known algorithms, viz., ARACNE, CLR and MRNET. Our method outperforms other algorithms based on in silico regulatory network reconstruction. Experimental results show that GeCON can extract functionally enriched network modules from real expression data. CONCLUSIONS: In view of the results over several in-silico and real expression datasets, the proposed GeCON shows satisfactory performance in predicting co-expression network in a computationally inexpensive way. We further establish that a simple expression pattern matching is helpful in finding biologically relevant gene network. In future, we aim to introduce an enhanced GeCON to identify Protein-Protein interaction network complexes by incorporating variable density concept.

Assuntos

Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos , Algoritmos , Simulação por Computador , Regulação para Baixo , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Humanos , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos

13.

An effective method for network module extraction from microarray data.

Mahanta, Priyakshi; Ahmed, Hasin A; Bhattacharyya, Dhruba K; Kalita, Jugal K.

BMC Bioinformatics ; 13 Suppl 13: S4, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23320896

RESUMO

BACKGROUND: The development of high-throughput Microarray technologies has provided various opportunities to systematically characterize diverse types of computational biological networks. Co-expression network have become popular in the analysis of microarray data, such as for detecting functional gene modules. RESULTS: This paper presents a method to build a co-expression network (CEN) and to detect network modules from the built network. We use an effective gene expression similarity measure called NMRS (Normalized mean residue similarity) to construct the CEN. We have tested our method on five publicly available benchmark microarray datasets. The network modules extracted by our algorithm have been biologically validated in terms of Q value and p value. CONCLUSIONS: Our results show that the technique is capable of detecting biologically significant network modules from the co-expression network. Biologist can use this technique to find groups of genes with similar functionality based on their expression information.

Assuntos

Biologia Computacional/métodos , Interpretação Estatística de Dados , Perfilação da Expressão Gênica/estatística & dados numéricos , Redes Reguladoras de Genes , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Algoritmos , Bases de Dados Genéticas/estatística & dados numéricos , Expressão Gênica

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA