Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Bioinformatics ; 40(8)2024 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-39120880

RESUMEN

MOTIVATION: Although human tissues carry out common molecular processes, gene expression patterns can distinguish different tissues. Traditional informatics methods, primarily at the gene level, overlook the complexity of alternative transcript variants and protein isoforms produced by most genes, changes in which are linked to disease prognosis and drug resistance. RESULTS: We developed TransTEx (Transcript-level Tissue Expression), a novel tissue-specificity scoring method, for grouping transcripts into four expression groups. TransTEx applies sequential cut-offs to tissue-wise transcript probability estimates, subsampling-based P-values and fold-change estimates. Application of TransTEx on GTEx mRNA-seq data divided 199 166 human transcripts into different groups as 17 999 tissue-specific (TSp), 7436 tissue-enhanced, 36 783 widely expressed (Wide), 79 191 lowly expressed (Low), and 57 757 no expression (Null) transcripts. Testis has the most (13 466) TSp isoforms followed by liver (890), brain (701), pituitary (435), and muscle (420). We found that the tissue specificity of alternative transcripts of a gene is predominantly influenced by alternate promoter usage. By overlapping brain-specific transcripts with the cell-type gene-markers in scBrainMap database, we found that 63% of the brain-specific transcripts were enriched in nonneuronal cell types, predominantly astrocytes followed by endothelial cells and oligodendrocytes. In addition, we found 61 brain cell-type marker genes encoding a total of 176 alternative transcripts as brain-specific and 22 alternative transcripts as testis-specific, highlighting the complex TSp and cell-type specific gene regulation and expression at isoform-level. TransTEx can be adopted to the analysis of bulk RNA-seq or scRNA-seq datasets to find tissue- and/or cell-type specific isoform-level gene markers. AVAILABILITY AND IMPLEMENTATION: TransTEx database: https://bmi.cewit.stonybrook.edu/transtexdb/ and the R package is available via GitHub: https://github.com/pallavisurana1/TransTEx.


Asunto(s)
Especificidad de Órganos , Transcriptoma , Humanos , Transcriptoma/genética , Programas Informáticos , Perfilación de la Expresión Génica/métodos
2.
Bioinform Adv ; 3(1): vbad075, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37424943

RESUMEN

Motivation: Molecular subtyping by integrative modeling of multi-omics and clinical data can help the identification of robust and clinically actionable disease subgroups; an essential step in developing precision medicine approaches. Results: We developed a novel outcome-guided molecular subgrouping framework, called Deep Multi-Omics Integrative Subtyping by Maximizing Correlation (DeepMOIS-MC), for integrative learning from multi-omics data by maximizing correlation between all input -omics views. DeepMOIS-MC consists of two parts: clustering and classification. In the clustering part, the preprocessed high-dimensional multi-omics views are input into two-layer fully connected neural networks. The outputs of individual networks are subjected to Generalized Canonical Correlation Analysis loss to learn the shared representation. Next, the learned representation is filtered by a regression model to select features that are related to a covariate clinical variable, for example, a survival/outcome. The filtered features are used for clustering to determine the optimal cluster assignments. In the classification stage, the original feature matrix of one of the -omics view is scaled and discretized based on equal frequency binning, and then subjected to feature selection using RandomForest. Using these selected features, classification models (for example, XGBoost model) are built to predict the molecular subgroups that were identified at clustering stage. We applied DeepMOIS-MC on lung and liver cancers, using TCGA datasets. In comparative analysis, we found that DeepMOIS-MC outperformed traditional approaches in patient stratification. Finally, we validated the robustness and generalizability of the classification models on independent datasets. We anticipate that the DeepMOIS-MC can be adopted to many multi-omics integrative analyses tasks. Availability and implementation: Source codes for PyTorch implementation of DGCCA and other DeepMOIS-MC modules are available at GitHub (https://github.com/duttaprat/DeepMOIS-MC). Supplementary information: Supplementary data are available at Bioinformatics Advances online.

3.
Anal Chem ; 95(19): 7779-7787, 2023 05 16.
Artículo en Inglés | MEDLINE | ID: mdl-37141575

RESUMEN

The cascade of immune responses involves activation of diverse immune cells and release of a large amount of cytokines, which leads to either normal, balanced inflammation or hyperinflammatory responses and even organ damage by sepsis. Conventional diagnosis of immunological disorders based on multiple cytokines in the blood serum has varied accuracy, and it is difficult to distinguish normal inflammation from sepsis. Herein, we present an approach to detect immunological disorders through rapid, ultrahigh-multiplex analysis of T cells using single-cell multiplex in situ tagging (scMIST) technology. scMIST permits simultaneous detection of 46 markers and cytokines from single cells without the assistance of special instruments. A cecal ligation and puncture sepsis model was built to supply T cells from two groups of mice that survived the surgery or died after 1 day. The scMIST assays have captured the T cell features and the dynamics over the course of recovery. Compared with cytokines in the peripheral blood, T cell markers show different dynamics and cytokine levels. We have applied a random forest machine learning model to single T cells from two groups of mice. Through training, the model has been able to predict the group of mice through T cell classification and majority rule with 94% accuracy. Our approach pioneers the direction of single-cell omics and could be widely applicable to human diseases.


Asunto(s)
Enfermedades del Sistema Inmune , Sepsis , Humanos , Ratones , Animales , Citocinas , Inflamación , Linfocitos T , Sepsis/diagnóstico , Modelos Animales de Enfermedad
4.
IEEE/ACM Trans Comput Biol Bioinform ; 19(5): 2770-2781, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34166198

RESUMEN

An in-depth exploration of gene prognosis using different methodologies aids in understanding various biological regulations of genes in disease pathobiology and molecular functions. Interpreting gene functions at biological and molecular levels remains a daunting yet crucial task in domains such as drug design, personalized medicine, and next-generation diagnostics. Recent advancements in omics technologies have produced diverse heterogeneous genomic datasets like micro-array gene expression, miRNA expression, DNA sequence, 3D structures, which are significant resources for understanding the gene functions. In this paper, we propose a novel self-attention based deep multi-modal model, named DeePROG, for the prognosis of disease affected genes based on heterogeneous omics data. We use three NCBI datasets covering three modalities, namely gene expression profile, the underlying DNA sequence, and the 3D protein structures. To extract useful features from each modality, we develop several context-specific deep learning models. Besides, we develop three attention-based deep bi-modal architectures along with DeePROG to leverage the prognosis of the underlying biomedical data. We assess the performance of the models' in terms of computational assessment of function annotation (CAFA2) metrics. Moreover, we analyze the results in terms of receiver operating characteristics (ROC) curve in high-class imbalance data setting and perform statistical significance tests in terms of Welch's t-test. Experiment results show that DeePROG significantly outperforms baseline models across in terms of performance metrics. The source code and all preprocessed datasets used in this study are available at https://github.com/duttaprat/DeePROG.


Asunto(s)
MicroARNs , Programas Informáticos , Genómica/métodos , Proteínas , Transcriptoma
5.
Health Sci Rep ; 4(3): e345, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34386613

RESUMEN

BACKGROUND AND AIMS: According to the World Health Organization (WHO), more than 75.7 million confirmed cases of coronavirus disease 2019 (COVID-19), a global pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), have been reported so far. Researchers are working relentlessly to find effective solutions to this catastrophe, using genomic sequence-based investigation, immunological analysis, and more. The role of health disparity has also emerged as an intriguing factor that made a huge impact on the lives of people. METHODS: We analyzed various factors that triggered the health disparity in the United States of America along with the rate of COVID-19 morbidity and mortality. Furthermore, we have also focused on the State of Mississippi, which is suffering from an extreme health disparity. Data have been obtained from publicly available data sources including, Center for Disease Control and Prevention and Mississippi State Department of Health. Correlation analysis of the dataset has been performed using R software. RESULTS: Our analysis suggested that the COVID-19 infection rate per 100 000 people is directly correlated with the increasing number of the African American population in the United States. We have found a strong correlation between the obesity and the COVID-19 cases as well. All the counties in Mississippi demonstrate a strong correlation between a higher number of African American population to COVID-19 cases and obesity. Our data also indicate that a higher number of African American populations are facing socioeconomic disadvantages, which enhance their chances of becoming vulnerable to pre-existing ailments such as obesity, type-2 diabetes, and cardiovascular diseases. CONCLUSION: We proposed a possible explanation of increased COVID-19 infectivity in the African American population in the United States. This work has highlighted the intriguing factors that increased the health disparity at the time of the COVID-19 pandemic.

6.
IEEE J Biomed Health Inform ; 25(5): 1832-1838, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-32897865

RESUMEN

Protein is an essential macro-nutrient for perceiving a wide range of biochemical activities and biological regulations in living cells. In this work, we have presented a novel multi-modal approach, named MultiPredGO, for predicting protein functions by utilizing two different kinds of information, namely protein sequence and the protein secondary structure. Here, our contributions are threefold; firstly, along with the protein sequence, we learn the feature representation from the protein structure. Secondly, we develop two different deep learning models after considering the characteristics of the underlying data patterns of the protein sequence and protein 3D structures. Finally, along with these two modalities, we have also utilized protein interaction information for expediting the efficiency of the proposed model in predicting the protein functions. For extracting features from different modalities, we have utilized various variations of the convolutional neural network. As the protein function classes are dependent on each other, we have used a neuro-symbolic hierarchical classification model, which resembles the structure of Gene Ontology (GO), for effectively predicting the dependent protein functions. Finally, to validate the goodness of our proposed method (MultiPredGO), we have compared our results with various uni-modal along with two well-known multi-modal protein function prediction approaches, namely, INGA and DeepGO. Results show that the overall performance of the proposed approach in terms of accuracy, F-measure, precision, and recall metrics are better than those by the state-of-the-art methods. MultiPredGO attains an average 13.05% and 30.87% improvements over the best existing comparing approach (DeepGO) for cellular component and molecular functions, respectively.


Asunto(s)
Redes Neurales de la Computación , Proteínas
7.
Comput Biol Med ; 125: 103965, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32931989

RESUMEN

Deciphering patterns in the structural and functional anatomy of genes can prove to be very helpful in understanding genetic biology and genomics. Also, the availability of the multiple omics data, along with the advent of machine learning techniques, aids medical professionals in gaining insights about various biological regulations. Gene clustering is one of the many such computation techniques that can help in understanding gene behavior. However, more comprehensive and reliable insights can be gained if different modalities/views of biomedical data are considered. However, in most multi-view cases, each view contains some missing data, leading to incomplete multi-view clustering. In this study, we have presented a deep Boltzmann machine-based incomplete multi-view clustering framework for gene clustering. Here, we seek to regenerate the data of the three NCBI datasets in the incomplete modalities using Shape Boltzmann Machines. The overall performance of the proposed multi-view clustering technique has been evaluated using the Silhouette index and Davies-Bouldin index, and the comparative analysis shows an improvement over state-of-the-art methods. Finally, to prove that the improvement attained by the proposed incomplete multi-view clustering is statistically significant, we perform Welch's t-test. AVAILABILITY OF DATA AND MATERIALS: https://github.com/piyushmishra12/IMC.


Asunto(s)
Algoritmos , Aprendizaje Automático , Análisis por Conglomerados , Genómica
8.
Sci Rep ; 10(1): 665, 2020 01 20.
Artículo en Inglés | MEDLINE | ID: mdl-31959782

RESUMEN

In the field of computational bioinformatics, identifying a set of genes which are responsible for a particular cellular mechanism, is very much essential for tasks such as medical diagnosis or disease gene identification. Accurately grouping (clustering) the genes is one of the important tasks in understanding the functionalities of the disease genes. In this regard, ensemble clustering becomes a promising approach to combine different clustering solutions to generate almost accurate gene partitioning. Recently, researchers have used generative model as a smart ensemble method to produce the right consensus solution. In the current paper, we develop a protein-protein interaction-based generative model that can efficiently perform a gene clustering. Utilizing protein interaction information as the generative model's latent variable enables enhance the generative model's efficiency in inferring final probabilistic labels. The proposed generative model utilizes different weak supervision sources rather utilizing any ground truth information. For weak supervision sources, we use a multi-objective optimization based clustering technique together with the world's largest gene ontology based knowledge-base named Gene Ontology Consortium(GOC). These weakly supervised labels are supplied to a generative model that eventually assigns all genes to probabilistic labels. The comparative study with respect to silhouette score, Biological Homogeneity Index (BHI) and Biological Stability Index (BSI) proves that the proposed generative model outperforms than other state-of-the-art techniques.


Asunto(s)
Análisis por Conglomerados , Genómica , Modelos Genéticos , Familia de Multigenes , Dominios y Motivos de Interacción de Proteínas/genética , Transcriptoma , Biología Computacional , Conjuntos de Datos como Asunto , Ontología de Genes , Humanos , Modelos Estadísticos
9.
IEEE/ACM Trans Comput Biol Bioinform ; 17(6): 2005-2016, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31135367

RESUMEN

Cluster ensemble techniques aim to combine the outputs of multiple clustering algorithms to obtain a single consensus partitioning. The current paper reports about the development of a cluster ensemble based technique combining the concepts of multiobjective optimization and deep-learning models for gene clustering where some additional protein-protein interaction information are utilized for generating the consensus partitioning. The proposed ensemble based framework works in four phases: (i) filtering out the irrelevant genes from the microarray dataset: only the statistically significant genes are considered for further data analysis; (ii) generation of diverse base partitionings: a multi-objective optimization-based clustering technique is proposed which simultaneously optimizes three different cluster quality measures and generates a set of partitioning solutions on the Pareto optimal front; (iii) generation of a consensus partitioning: mentha scores, calculated by accessing a highly enriched protein-protein interaction archive named mentha, of different clustering solutions are considered for generating a weighted incidence matrix; (iv) finally, two approaches are used to generate a consensus partitioning from the obtained incidence matrix. The first approach is based on a traditional machine learning method, and another approach exploits the graph partitioning algorithm and two deep neural models to generate the final clustering. To validate the efficacy of the proposed ensemble framework, it is applied on five gene expression datasets. We present a comparative analysis of the proposed technique over different clustering algorithms in terms of biological homogeneity index (BHI) and biological stability index (BSI). The traditional approach attains an average 3 and 2 percent improvements over the best non-dominated solution with respect to BHI and BSI, respectively, whereas deep learning models illustrate an average 6.8 and 1.5 percent improvements over the proposed traditional approach with respect to BHI and BSI, respectively. Subsequently, Welch's t-test is executed to prove that the results obtained by the proposed methods are statistically significant. Availability of data and materials: https://github.com/sduttap16/DeepEnsm.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Familia de Multigenes/genética , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas/genética , Análisis por Conglomerados , Filogenia
10.
IEEE J Biomed Health Inform ; 23(6): 2670-2676, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-30676987

RESUMEN

Classification of samples of gene expression profile plays a significant role in prediction and diagnosis of diseases. In the task of sample classification, a robust feature selection algorithm is very much essential to identify the important genes from the high dimensional gene expression data. This paper explores the information of protein-protein interaction with a graph mining technique for finding a proper subset of features (genes), which further takes part in sample classification. Here, our contribution for feature selection is three-fold: first, all the genes are grouped into different clusters based on the integrated information of the gene expression values and their protein interactions using a multi-objective optimization based clustering approach. Second, the confidence scores of the protein interactions are incorporated in a popular graph mining algorithm namely Goldberg algorithm to find out the relevant features. These features are the topologically and functionally significant genes, named as hub genes. Finally, these hub genes are identified varying the degrees of the nodes, and those are utilized for the sample classification task. Different machine learning classifiers are exploited for this purpose, and the classification performance is measured with respect to various performance metrics namely accuracy, sensitivity, specificity, precision, F-measure, and Mathews coefficient correlation. Comparative analysis with respect to two baselines and several existing approaches proves the efficiency of the proposed approach. Furthermore, the robustness of the identified hub-gene modules is endorsed using some strong biological significance analysis.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Mapas de Interacción de Proteínas/genética , Transcriptoma/genética , Algoritmos , Análisis por Conglomerados , Humanos , Aprendizaje Automático
11.
Comput Biol Med ; 89: 31-43, 2017 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-28783536

RESUMEN

One of the crucial problems in the field of functional genomics is to identify a set of genes which are responsible for a particular cellular mechanism. The current work explores the usage of a multi-objective optimization based genetic clustering technique to classify genes into groups with respect to their functional similarities and biological relevance. Our contribution is two-fold: firstly a new quality measure to compute the goodness of gene-clusters namely protein-protein interaction confidence score is developed. This utilizes the confidence scores of the protein-protein interaction networks to measure the similarity between genes of a particular cluster with respect to their biochemical protein products. Secondly, a multi-objective based clustering approach is developed which intelligently uses integrated information of expression values of microarray dataset and protein-protein interaction confidence scores to select both statistically and biologically relevant genes. For that very purpose, some biological cluster validity indices, viz. biological homogeneity index and protein-protein interaction confidence score, along with two traditional internal cluster validity indices, viz. fuzzy partition coefficient and Pakhira-Bandyopadhyay-Maulik-index, are simultaneously optimized during the clustering process. Experimental results on three real-life gene expression datasets show that the addition of new objective capturing protein-protein interaction information aids in clustering the genes as compared to the existing techniques. The observations are further supported by biological and statistical significance tests.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Análisis por Micromatrices , Modelos Biológicos , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA