Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Curr Opin Struct Biol ; 84: 102747, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38091924

RESUMEN

Drug response prediction is essential for drug development and disease treatment. One key question in predicting drug response is the representation of molecules, which has been greatly advanced by artificial intelligence (AI) techniques in recent years. In this review, we first describe different types of representation methods, pinpointing their key principles and discussing their limitations. Thereafter we discuss potential ways how these methods could be further developed. We expect that this review will provide useful guidance for researchers in the community.


Asunto(s)
Inteligencia Artificial , Preparaciones Farmacéuticas
2.
Bioinformatics ; 39(39 Suppl 1): i368-i376, 2023 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-37387178

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection methods underutilize the discriminatory potential of genes across distinct cell types. We hypothesize that incorporating such information could further boost the performance of single cell clustering. RESULTS: We develop CellBRF, a feature selection method that considers genes' relevance to cell types for single-cell clustering. The key idea is to identify genes that are most important for discriminating cell types through random forests guided by predicted cell labels. Moreover, it proposes a class balancing strategy to mitigate the impact of unbalanced cell type distributions on feature importance evaluation. We benchmark CellBRF on 33 scRNA-seq datasets representing diverse biological scenarios and demonstrate that it substantially outperforms state-of-the-art feature selection methods in terms of clustering accuracy and cell neighborhood consistency. Furthermore, we demonstrate the outstanding performance of our selected features through three case studies on cell differentiation stage identification, non-malignant cell subtype identification, and rare cell identification. CellBRF provides a new and effective tool to boost single-cell clustering accuracy. AVAILABILITY AND IMPLEMENTATION: All source codes of CellBRF are freely available at https://github.com/xuyp-csu/CellBRF.


Asunto(s)
Benchmarking , Bosques Aleatorios , Diferenciación Celular , Análisis por Conglomerados
3.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-36987778

RESUMEN

Alternative splicing (AS) is a key transcriptional regulation pathway. Recent studies have shown that AS events are associated with the occurrence of complex diseases. Various computational approaches have been developed for the detection of disease-associated AS events. In this review, we first describe the metrics used for quantitative characterization of AS events. Second, we review and discuss the three types of methods for detecting disease-associated splicing events, which are differential splicing analysis, aberrant splicing detection and splicing-related network analysis. Third, to further exploit the genetic mechanism of disease-associated AS events, we describe the methods for detecting genetic variants that potentially regulate splicing. For each type of methods, we conducted experimental comparison to illustrate their performance. Finally, we discuss the limitations of these methods and point out potential ways to address them. We anticipate that this review provides a systematic understanding of computational approaches for the analysis of disease-associated splicing.


Asunto(s)
Empalme Alternativo , Biología Computacional
4.
Artículo en Inglés | MEDLINE | ID: mdl-36240041

RESUMEN

It is important to identify disease-associated genes for studying the pathogenic mechanism of complex diseases. Recently, models for disease gene prediction are dominantly based on molecular expression data and networks, including gene expression, protein expression, co-expression networks, protein-protein interaction networks, etc. One limitation of these methods is that they do not consider the knowledge of annotated gene sets representing known pathways or functionally-related sets of genes. In this study, we propose a new approach to predict disease-associated genes by integrating annotated gene sets data from the Molecular Signature Database (MSigDB). It first represents and integrates the different types of annotated gene sets in the MSigDB database in the form of the signal matrix. It then uses the signal matrix as the gene feature to train the disease gene prediction model. We compare our method with existing methods in predicting genes for five complex diseases. The results show that our method is superior to other methods. Further, we perform a case study on autism spectrum disorder (ASD). We find that ASD predictions are associated with ASD based on the statistical analysis of biological networks and independent ASD studies. The source code, prediction results and datasets are publicly available on https://github.com/genemine/GSI.git.

5.
Bioinformatics ; 38(20): 4806-4808, 2022 10 14.
Artículo en Inglés | MEDLINE | ID: mdl-36000853

RESUMEN

MOTIVATION: Gene-centric bioinformatics studies frequently involve the calculation or the extraction of various features of genes such as splice sites, promoters, independent introns and untranslated regions (UTRs) through manipulation of gene models. Gene models are often annotated in gene transfer format (GTF) files. The features are essential for subsequent analysis such as intron retention detection, DNA-binding site identification and computing splicing strength of splice sites. Some features such as independent introns and splice sites are not provided in existing resources including the commonly used BioMart database. A package that implements and integrates functions to analyze various features of genes will greatly ease routine analysis for related bioinformatics studies. However, to the best of our knowledge, such a package is not available yet. RESULTS: We introduce GTFtools, a stand-alone command-line software that provides a set of functions to calculate various gene features, including splice sites, independent introns, transcription start sites (TSS)-flanking regions, UTRs, isoform coordination and length, different types of gene lengths, etc. It takes the ENSEMBL or GENCODE GTF files as input and can be applied to both human and non-human gene models like the lab mouse. We compare the utilities of GTFtools with those of two related tools: Bedtools and BioMart. GTFtools is implemented in Python and not dependent on any third-party software, making it very easy to install and use. AVAILABILITY AND IMPLEMENTATION: GTFtools is freely available at www.genemine.org/gtftools.php as well as pyPI and Bioconda.


Asunto(s)
Biología Computacional , Programas Informáticos , ADN , Intrones , Regiones no Traducidas
6.
Bioinformatics ; 38(7): 2030-2032, 2022 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-35040932

RESUMEN

MOTIVATION: Alzheimer's disease (AD) is a complex brain disorder with risk genes incompletely identified. The candidate genes are dominantly obtained by computational approaches. In order to obtain biological insights of candidate genes or screen genes for experimental testing, it is essential to assess their relevance to AD. A platform that integrates different types of omics data and approaches would facilitate the analysis of candidate genes and is in great need. RESULTS: We report AlzCode, a platform for multiview analysis of genes related to AD. First, this platform integrates a rich collection of functional genomic data, including expression data of AD samples (gene expression, single-cell RNA-seq data and protein expression), AD-specific biological networks (co-expression networks and functional gene networks), neuropathological and clinical traits (CERAD score, Braak staging score, Clinical Dementia Rating, cognitive function and clinical severity) and general data such as protein-protein interaction, regulatory networks, sequence similarity and miRNA-target interactions. These data provide basis for analyzing genes from different views. Second, the platform integrates multiple approaches designed for the various types of data. We implement functions to analyze both individual genes and gene sets. We also compare AlzCode with two existing platforms for AD analysis, which are Agora and AD Atlas. We pinpoint the features of each platform and highlight their differences. This platform would be valuable to the understanding of AD genetics and pathological mechanisms. AVAILABILITY AND IMPLEMENTATION: AlzCode is freely available at: http://www.alzcode.xyz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/genética , Redes Reguladoras de Genes , Genómica
7.
Nucleic Acids Res ; 50(D1): D710-D718, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850130

RESUMEN

Mapping gene interactions within tissues/cell types plays a crucial role in understanding the genetic basis of human physiology and disease. Tissue functional gene networks (FGNs) are essential models for mapping complex gene interactions. We present TissueNexus, a database of 49 human tissue/cell line FGNs constructed by integrating heterogeneous genomic data. We adopted an advanced machine learning approach for data integration because Bayesian classifiers, which is the main approach used for constructing existing tissue gene networks, cannot capture the interaction and nonlinearity of genomic features well. A total of 1,341 RNA-seq datasets containing 52,087 samples were integrated for all of these networks. Because the tissue label for RNA-seq data may be annotated with different names or be missing, we performed intensive hand-curation to improve quality. We further developed a user-friendly database for network search, visualization, and functional analysis. We illustrate the application of TissueNexus in prioritizing disease genes. The database is publicly available at https://www.diseaselinks.com/TissueNexus/.


Asunto(s)
Bases de Datos Genéticas , Redes Reguladoras de Genes/genética , Especificidad de Órganos/genética , RNA-Seq , Curaduría de Datos , Manejo de Datos , Genoma Humano/genética , Humanos , Programas Informáticos
8.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34953465

RESUMEN

Alzheimer's disease (AD) has a strong genetic predisposition. However, its risk genes remain incompletely identified. We developed an Alzheimer's brain gene network-based approach to predict AD-associated genes by leveraging the functional pattern of known AD-associated genes. Our constructed network outperformed existing networks in predicting AD genes. We then systematically validated the predictions using independent genetic, transcriptomic, proteomic data, neuropathological and clinical data. First, top-ranked genes were enriched in AD-associated pathways. Second, using external gene expression data from the Mount Sinai Brain Bank study, we found that the top-ranked genes were significantly associated with neuropathological and clinical traits, including the Consortium to Establish a Registry for Alzheimer's Disease score, Braak stage score and clinical dementia rating. The analysis of Alzheimer's brain single-cell RNA-seq data revealed cell-type-specific association of predicted genes with early pathology of AD. Third, by interrogating proteomic data in the Religious Orders Study and Memory and Aging Project and Baltimore Longitudinal Study of Aging studies, we observed a significant association of protein expression level with cognitive function and AD clinical severity. The network, method and predictions could become a valuable resource to advance the identification of risk genes for AD.


Asunto(s)
Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Encéfalo/metabolismo , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Envejecimiento/genética , Perfilación de la Expresión Génica , Humanos , Estudios Longitudinales , Memoria , Proteómica , RNA-Seq , Transcriptoma
9.
Front Genet ; 12: 665843, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34386033

RESUMEN

In recent years, the application of single cell RNA-seq (scRNA-seq) has become more and more popular in fields such as biology and medical research. Analyzing scRNA-seq data can discover complex cell populations and infer single-cell trajectories in cell development. Clustering is one of the most important methods to analyze scRNA-seq data. In this paper, we focus on improving scRNA-seq clustering through gene selection, which also reduces the dimensionality of scRNA-seq data. Studies have shown that gene selection for scRNA-seq data can improve clustering accuracy. Therefore, it is important to select genes with cell type specificity. Gene selection not only helps to reduce the dimensionality of scRNA-seq data, but also can improve cell type identification in combination with clustering methods. Here, we proposed RFCell, a supervised gene selection method, which is based on permutation and random forest classification. We first use RFCell and three existing gene selection methods to select gene sets on 10 scRNA-seq data sets. Then, three classical clustering algorithms are used to cluster the cells obtained by these gene selection methods. We found that the gene selection performance of RFCell was better than other gene selection methods.

10.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34131702

RESUMEN

In single-cell RNA-seq (scRNA-seq) data analysis, a fundamental problem is to determine the number of cell clusters based on the gene expression profiles. However, the performance of current methods is still far from satisfactory, presumably due to their limitations in capturing the expression variability among cell clusters. Batch effects represent the undesired variability between data measured in different batches. When data are obtained from different labs or protocols batch effects occur. Motivated by the practice of batch effect removal, we considered cell clusters as batches. We hypothesized that the number of cell clusters (i.e. batches) could be correctly determined if the variances among clusters (i.e. batch effects) were removed. We developed a new method, namely, removal of batch effect and testing (REBET), for determining the number of cell clusters. In this method, cells are first partitioned into k clusters. Second, the batch effects among these k clusters are then removed. Third, the quality of batch effect removal is evaluated with the average range of normalized mutual information (ARNMI), which measures how uniformly the cells with batch-effects-removal are mixed. By testing a range of k values, the k value that corresponds to the lowest ARNMI is determined to be the optimal number of clusters. We compared REBET with state-of-the-art methods on 32 simulated datasets and 14 published scRNA-seq datasets. The results show that REBET can accurately and robustly estimate the number of cell clusters and outperform existing methods. Contact: H.D.L. (hongdong@csu.edu.cn) or Q.S.X. (qsxu@csu.edu.cn).


Asunto(s)
Análisis por Conglomerados , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Bases de Datos Genéticas , Reproducibilidad de los Resultados
11.
Front Genet ; 11: 586, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32733531

RESUMEN

Intron retention (IR) is an alternative splicing mode whereby introns, rather than being spliced out as usual, are retained in mature mRNAs. It was previously considered a consequence of mis-splicing and received very limited attention. Only recently has IR become of interest for transcriptomic data analysis owing to its recognized roles in gene expression regulation and associations with complex diseases. In this article, we first review the function of IR in regulating gene expression in a number of biological processes, such as neuron differentiation and activation of CD4+ T cells. Next, we briefly review its association with diseases, such as Alzheimer's disease and cancers. Then, we describe state-of-the-art methods for IR detection, including RNA-seq analysis tools IRFinder and iREAD, highlighting their underlying principles and discussing their advantages and limitations. Finally, we discuss the challenges for IR detection and potential ways in which IR detection methods could be improved.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...