Búsqueda | Portal Regional de la BVS Paraguay

1.

scCAD: Cluster decomposition-based anomaly detection for rare cell identification in single-cell expression data.

Xu, Yunpei; Wang, Shaokai; Feng, Qilong; Xia, Jiazhi; Li, Yaohang; Li, Hong-Dong; Wang, Jianxin.

Nat Commun ; 15(1): 7561, 2024 Aug 31.

Artículo en Inglés | MEDLINE | ID: mdl-39215003

RESUMEN

Single-cell RNA sequencing (scRNA-seq) technologies have become essential tools for characterizing cellular landscapes within complex tissues. Large-scale single-cell transcriptomics holds great potential for identifying rare cell types critical to the pathogenesis of diseases and biological processes. Existing methods for identifying rare cell types often rely on one-time clustering using partial or global gene expression. However, these rare cell types may be overlooked during the clustering phase, posing challenges for their accurate identification. In this paper, we propose a Cluster decomposition-based Anomaly Detection method (scCAD), which iteratively decomposes clusters based on the most differential signals in each cluster to effectively separate rare cell types and achieve accurate identification. We benchmark scCAD on 25 real-world scRNA-seq datasets, demonstrating its superior performance compared to 10 state-of-the-art methods. In-depth case studies across diverse datasets, including mouse airway, brain, intestine, human pancreas, immunology data, and clear cell renal cell carcinoma, showcase scCAD's efficiency in identifying rare cell types in complex biological scenarios. Furthermore, scCAD can correct the annotation of rare cell types and identify immune cell subtypes associated with disease, thereby offering valuable insights into disease progression.

Asunto(s)

Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Ratones , Animales , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Transcriptoma , Páncreas/metabolismo , Páncreas/patología , Páncreas/citología , RNA-Seq/métodos , Biología Computacional/métodos

2.

Identifying new cancer genes based on the integration of annotated gene sets via hypergraph neural networks.

Deng, Chao; Li, Hong-Dong; Zhang, Li-Shen; Liu, Yiwei; Li, Yaohang; Wang, Jianxin.

Bioinformatics ; 40(Suppl 1): i511-i520, 2024 06 28.

Artículo en Inglés | MEDLINE | ID: mdl-38940121

RESUMEN

MOTIVATION: Identifying cancer genes remains a significant challenge in cancer genomics research. Annotated gene sets encode functional associations among multiple genes, and cancer genes have been shown to cluster in hallmark signaling pathways and biological processes. The knowledge of annotated gene sets is critical for discovering cancer genes but remains to be fully exploited. RESULTS: Here, we present the DIsease-Specific Hypergraph neural network (DISHyper), a hypergraph-based computational method that integrates the knowledge from multiple types of annotated gene sets to predict cancer genes. First, our benchmark results demonstrate that DISHyper outperforms the existing state-of-the-art methods and highlight the advantages of employing hypergraphs for representing annotated gene sets. Second, we validate the accuracy of DISHyper-predicted cancer genes using functional validation results and multiple independent functional genomics data. Third, our model predicts 44 novel cancer genes, and subsequent analysis shows their significant associations with multiple types of cancers. Overall, our study provides a new perspective for discovering cancer genes and reveals previously undiscovered cancer genes. AVAILABILITY AND IMPLEMENTATION: DISHyper is freely available for download at https://github.com/genemine/DISHyper.

Asunto(s)

Neoplasias , Redes Neurales de la Computación , Humanos , Neoplasias/genética , Biología Computacional/métodos , Genómica/métodos , Genes Relacionados con las Neoplasias , Anotación de Secuencia Molecular/métodos , Bases de Datos Genéticas

3.

Characterised intron retention profiles in muscle tissue of idiopathic inflammatory myopathy subtypes.

Xiao, Yizhi; Xie, Shasha; Li, Hong-Dong; Liu, Yanjuan; Zhang, Huali; Zuo, Xiaoxia; Zhu, Honglin; Li, Yisha; Luo, Hui.

Ann Rheum Dis ; 83(7): 901-914, 2024 Jun 12.

Artículo en Inglés | MEDLINE | ID: mdl-38302260

RESUMEN

OBJECTIVES: Idiopathic inflammatory myopathies (IIMs) are a group of heterogeneous autoimmune diseases. Intron retention (IR) serves as an important post-transcriptional and translational regulatory mechanism. This study aims to identify changes in IR profiles in IIM subtypes, investigating their influence on proteins and their correlations with clinical features. METHODS: RNA sequencing and liquid chromatography-tandem mass spectrometry were performed on muscle tissues obtained from 174 patients with IIM and 19 controls, following QC procedures. GTFtools and iREAD software were used for IR identification. An analysis of differentially expressed IRs (DEIs), exons and proteins was carried out using edgeR or DEP. Functional analysis was performed with clusterProfiler, and SPIRON was used to assess splicing factors. RESULTS: A total of 6783 IRs located in 3111 unique genes were identified in all IIM subtypes compared with controls. IIM subtype-specific DEIs were associated with the pathogenesis of respective IIM subtypes. Splicing factors YBX1 and HSPA2 exhibited the most changes in dermatomyositis and immune-mediated necrotising myopathy. Increased IR was associated with reduced protein expression. Some of the IIM-specific DEIs were correlated with clinical parameters (skin rash, MMT-8 scores and muscle enzymes) and muscle histopathological features (myofiber necrosis, regeneration and inflammation). IRs in IFIH1 and TRIM21 were strongly correlated with anti-MDA5+ antibody, while IRs in SRP14 were associated with anti-SRP+ antibody. CONCLUSION: This study revealed distinct IRs and specific splicing factors associated with IIM subtypes, which might be contributing to the pathogenesis of IIM. We also emphasised the potential impact of IR on protein expression in IIM muscles.

Asunto(s)

Intrones , Músculo Esquelético , Miositis , Humanos , Miositis/genética , Miositis/inmunología , Miositis/patología , Masculino , Femenino , Músculo Esquelético/patología , Músculo Esquelético/metabolismo , Persona de Mediana Edad , Intrones/genética , Adulto , Dermatomiositis/genética , Dermatomiositis/patología , Dermatomiositis/metabolismo , Dermatomiositis/inmunología , Estudios de Casos y Controles , Anciano , Análisis de Secuencia de ARN

4.

Artificial intelligence approaches for molecular representation in drug response prediction.

Lin, Cui-Xiang; Guan, Yuanfang; Li, Hong-Dong.

Curr Opin Struct Biol ; 84: 102747, 2024 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-38091924

RESUMEN

Drug response prediction is essential for drug development and disease treatment. One key question in predicting drug response is the representation of molecules, which has been greatly advanced by artificial intelligence (AI) techniques in recent years. In this review, we first describe different types of representation methods, pinpointing their key principles and discussing their limitations. Thereafter we discuss potential ways how these methods could be further developed. We expect that this review will provide useful guidance for researchers in the community.

Asunto(s)

Inteligencia Artificial , Preparaciones Farmacéuticas

5.

Graph-Based Fusion of Imaging, Genetic and Clinical Data for Degenerative Disease Diagnosis.

Guo, Rui; Tian, Xu; Lin, Hanhe; McKenna, Stephen; Li, Hong-Dong; Guo, Fei; Liu, Jin.

IEEE/ACM Trans Comput Biol Bioinform ; 21(1): 57-68, 2024.

Artículo en Inglés | MEDLINE | ID: mdl-37991907

RESUMEN

Graph learning methods have achieved noteworthy performance in disease diagnosis due to their ability to represent unstructured information such as inter-subject relationships. While it has been shown that imaging, genetic and clinical data are crucial for degenerative disease diagnosis, existing methods rarely consider how best to use their relationships. How best to utilize information from imaging, genetic and clinical data remains a challenging problem. This study proposes a novel graph-based fusion (GBF) approach to meet this challenge. To extract effective imaging-genetic features, we propose an imaging-genetic fusion module which uses an attention mechanism to obtain modality-specific and joint representations within and between imaging and genetic data. Then, considering the effectiveness of clinical information for diagnosing degenerative diseases, we propose a multi-graph fusion module to further fuse imaging-genetic and clinical features, which adopts a learnable graph construction strategy and a graph ensemble method. Experimental results on two benchmarks for degenerative disease diagnosis (Alzheimers Disease Neuroimaging Initiative and Parkinson's Progression Markers Initiative) demonstrate its effectiveness compared to state-of-the-art graph-based methods. Our findings should help guide further development of graph-based models for dealing with imaging, genetic and clinical data.

Asunto(s)

Enfermedad de Alzheimer , Neuroimagen , Humanos , Neuroimagen/métodos , Enfermedad de Alzheimer/diagnóstico por imagen , Enfermedad de Alzheimer/genética , Interpretación de Imagen Asistida por Computador/métodos , Bases de Datos Factuales

6.

FEED: a feature selection method based on gene expression decomposition for single cell clustering.

Zhang, Chao; Duan, Zhi-Wei; Xu, Yun-Pei; Liu, Jin; Li, Hong-Dong.

Brief Bioinform ; 24(6)2023 09 22.

Artículo en Inglés | MEDLINE | ID: mdl-37935617

RESUMEN

Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.

Asunto(s)

Perfilación de la Expresión Génica , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados , Expresión Génica

7.

IsoFrog: a reversible jump Markov Chain Monte Carlo feature selection-based method for predicting isoform functions.

Liu, Yiwei; Yang, Changhuo; Li, Hong-Dong; Wang, Jianxin.

Bioinformatics ; 39(9)2023 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-37647643

RESUMEN

MOTIVATION: A single gene may yield several isoforms with different functions through alternative splicing. Continuous efforts are devoted to developing machine-learning methods to predict isoform functions. However, existing methods do not consider the relevance of each feature to specific functions and ignore the noise caused by the irrelevant features. In this case, we hypothesize that constructing a feature selection framework to extract the function-relevant features might help improve the model accuracy in isoform function prediction. RESULTS: In this article, we present a feature selection-based approach named IsoFrog to predict isoform functions. First, IsoFrog adopts a reversible jump Markov Chain Monte Carlo (RJMCMC)-based feature selection framework to assess the feature importance to gene functions. Second, a sequential feature selection procedure is applied to select a subset of function-relevant features. This strategy screens the relevant features for the specific function while eliminating irrelevant ones, improving the effectiveness of the input features. Then, the selected features are input into our proposed method modified domain-invariant partial least squares, which prioritizes the most likely positive isoform for each positive MIG and utilizes diPLS for isoform function prediction. Tested on three datasets, our method achieves superior performance over six state-of-the-art methods, and the RJMCMC-based feature selection framework outperforms three classic feature selection methods. We expect this proposed methodology will promote the identification of isoform functions and further inspire the development of new methods. AVAILABILITY AND IMPLEMENTATION: IsoFrog is freely available at https://github.com/genemine/IsoFrog.

Asunto(s)

Empalme Alternativo , Aprendizaje Automático , Cadenas de Markov , Isoformas de Proteínas , Método de Montecarlo

8.

Comprehensive Analysis of 34 Edible Flowers by the Determination of Nutritional Composition and Antioxidant Capacity Planted in Yunnan Province China.

Zhang, Xing-Kai; Cao, Guan-Hua; Bi, Yue; Liu, Xiao-Hai; Yin, Hong-Mei; Zuo, Jia-Fang; Xu, Wen; Li, Hong-Dong; He, Sen; Zhou, Xu-Hong.

Molecules ; 28(13)2023 Jul 06.

Artículo en Inglés | MEDLINE | ID: mdl-37446920

RESUMEN

The main purpose of this study was to reveal the nutritional value and antioxidant activity of 34 edible flowers that grew in Yunnan Province, China, through a comprehensive assessment of their nutritional composition and antioxidant indices. The results showed that sample A3 of Asteraceae flowers had the highest total flavonoid content, with a value of 8.53%, and the maximum contents of vitamin C and reducing sugars were from Rosaceae sample R1 and Gentianaceae sample G3, with values of 143.80 mg/100 g and 7.82%, respectively. Samples R2 and R3 of Rosaceae were the top two flowers in terms of comprehensive nutritional quality. In addition, the antioxidant capacity of Rosaceae samples was evidently better than that of three others, in which Sample R1 had the maximum values in hydroxyl radical (·OH) scavenging and superoxide anion radical (·O2-) scavenging rates, and samples R2 and R3 showed a high total antioxidant capacity and 2,2-diphenyl-1-pyridylhydrazine (DPPH) scavenging rate, respectively. Taken together, there were significant differences in the nutrient contents and antioxidant properties of these 34 flowers, and the comprehensive quality of Rosaceae samples was generally better than the other three families. This study provides references for 34 edible flowers to be used as dietary supplements and important sources of natural antioxidants.

Asunto(s)

Antioxidantes , Fenoles , Humanos , Antioxidantes/química , Fenoles/química , China , Flores/química , Flavonoides/química , Extractos Vegetales/química

9.

CellBRF: a feature selection method for single-cell clustering using cell balance and random forest.

Xu, Yunpei; Li, Hong-Dong; Lin, Cui-Xiang; Zheng, Ruiqing; Li, Yaohang; Xu, Jinhui; Wang, Jianxin.

Bioinformatics ; 39(39 Suppl 1): i368-i376, 2023 06 30.

Artículo en Inglés | MEDLINE | ID: mdl-37387178

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) offers a powerful tool to dissect the complexity of biological tissues through cell sub-population identification in combination with clustering approaches. Feature selection is a critical step for improving the accuracy and interpretability of single-cell clustering. Existing feature selection methods underutilize the discriminatory potential of genes across distinct cell types. We hypothesize that incorporating such information could further boost the performance of single cell clustering. RESULTS: We develop CellBRF, a feature selection method that considers genes' relevance to cell types for single-cell clustering. The key idea is to identify genes that are most important for discriminating cell types through random forests guided by predicted cell labels. Moreover, it proposes a class balancing strategy to mitigate the impact of unbalanced cell type distributions on feature importance evaluation. We benchmark CellBRF on 33 scRNA-seq datasets representing diverse biological scenarios and demonstrate that it substantially outperforms state-of-the-art feature selection methods in terms of clustering accuracy and cell neighborhood consistency. Furthermore, we demonstrate the outstanding performance of our selected features through three case studies on cell differentiation stage identification, non-malignant cell subtype identification, and rare cell identification. CellBRF provides a new and effective tool to boost single-cell clustering accuracy. AVAILABILITY AND IMPLEMENTATION: All source codes of CellBRF are freely available at https://github.com/xuyp-csu/CellBRF.

Asunto(s)

Benchmarking , Bosques Aleatorios , Diferenciación Celular , Análisis por Conglomerados

10.

Computational approaches for detecting disease-associated alternative splicing events.

Liu, Jiashu; Lin, Cui-Xiang; Zhang, Xiaoqi; Li, Zongxuan; Huang, Wenkui; Liu, Jin; Guan, Yuanfang; Li, Hong-Dong.

Brief Bioinform ; 24(3)2023 05 19.

Artículo en Inglés | MEDLINE | ID: mdl-36987778

RESUMEN

Alternative splicing (AS) is a key transcriptional regulation pathway. Recent studies have shown that AS events are associated with the occurrence of complex diseases. Various computational approaches have been developed for the detection of disease-associated AS events. In this review, we first describe the metrics used for quantitative characterization of AS events. Second, we review and discuss the three types of methods for detecting disease-associated splicing events, which are differential splicing analysis, aberrant splicing detection and splicing-related network analysis. Third, to further exploit the genetic mechanism of disease-associated AS events, we describe the methods for detecting genetic variants that potentially regulate splicing. For each type of methods, we conducted experimental comparison to illustrate their performance. Finally, we discuss the limitations of these methods and point out potential ways to address them. We anticipate that this review provides a systematic understanding of computational approaches for the analysis of disease-associated splicing.

Asunto(s)

Empalme Alternativo , Biología Computacional

11.

IsoCell: An Approach to Enhance Single Cell Clustering by Integrating Isoform-Level Expression Through Orthogonal Projection.

Liu, Yingyi; Li, Hong-Dong; Xu, Yunpei; Liu, Yi-Wei; Peng, Xiaoqing; Wang, Jianxin.

IEEE/ACM Trans Comput Biol Bioinform ; 20(1): 465-475, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-35100120

RESUMEN

Single cell RNA sequencing (scRNA-seq) provides a powerful approach for profiling transcriptomes at single cell resolution. An essential application of scRNA-seq is the discovery of cell types with the aid of clustering analysis. Currently, existing single cell clustering methods are exclusively based on gene-level expression data, without considering alternative splicing information. It has been shown that alternative splicing has an important influence on biological processes such as cell differentiation and cell cycle. We therefore hypothesize that adding information about alternative splicing may help enhance single cell clustering. This motivates us to develop a way to integrate isoform-level expression and gene-level expression. We report an approach to enhance single cell clustering by integrating isoform-level expression through orthogonal projection. First, we construct an orthogonal projection matrix based on gene expression data. Second, isoforms are projected to the gene space to remove the redundant information between them. Third, isoform selection is performed based on the residual of the projected expression and the selected isoforms are combined with gene expression data for subsequent clustering. We applied our method to sixteen scRNA-seq datasets. We find that alternative splicing contains differential information among cell types and can be integrated to enhance single cell clustering. Compared with using only gene-level expression data, the integration of isoform-level expression leads to better clustering performances for most of the datasets. The integration of isoform-level expression also has potential in the detection of novel cell subgroups. Our study shows that integrating isoform and gene-level expression is a promising way to improve single cell clustering. The IsoCell R package is freely available at both Github (https://github.com/genemine/IsoCell) and Zenodo (https://zenodo.org/record/4395707).

Asunto(s)

Perfilación de la Expresión Génica , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Isoformas de Proteínas/genética , Análisis por Conglomerados

12.

A Comparison of Topologically Associating Domain Callers Based on Hi-C Data.

Liu, Kun; Li, Hong-Dong; Li, Yaohang; Wang, Jun; Wang, Jianxin.

IEEE/ACM Trans Comput Biol Bioinform ; 20(1): 15-29, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-35104223

RESUMEN

Topologically associating domains (TADs) are local chromatin interaction domains, which have been shown to play an important role in gene expression regulation. TADs were originally discovered in the investigation of 3D genome organization based on High-throughput Chromosome Conformation Capture (Hi-C) data. Continuous considerable efforts have been dedicated to developing methods for detecting TADs from Hi-C data. Different computational methods for TADs identification vary in their assumptions and criteria in calling TADs. As a consequence, the TADs called by these methods differ in their similarities and biological features they are enriched in. In this work, we performed a systematic comparison of twenty-six TAD callers. We first compared the TADs and gaps between adjacent TADs across different methods, resolutions, and sequencing depths. We then assessed the quality of TADs and TAD boundaries according to three criteria: the decay of contact frequencies over the genomic distance, enrichment and depletion of regulatory elements around TAD boundaries, and reproducibility of TADs and TAD boundaries in replicate samples. Last, due to the lack of a gold standard of TADs, we also evaluated the performance of the methods on synthetic datasets. We discussed the key principles of TAD callers, and pinpointed current situation in the detection of TADs. We provide a concise, comprehensive, and systematic framework for evaluating the performance of TAD callers, and expect our work will provide useful guidance in choosing suitable approaches for the detection and evaluation of TADs.

Asunto(s)

Cromatina , Cromosomas , Reproducibilidad de los Resultados , Cromatina/genética , Cromosomas/genética , Genoma , Regulación de la Expresión Génica

13.

A Gene Set-Integrated Approach for Predicting Disease-Associated Genes.

Li, Hong-Dong; Deng, Chao; Zhang, Xiao-Qi; Lin, Cui-Xiang.

IEEE/ACM Trans Comput Biol Bioinform ; PP2022 Oct 14.

Artículo en Inglés | MEDLINE | ID: mdl-36240041

RESUMEN

It is important to identify disease-associated genes for studying the pathogenic mechanism of complex diseases. Recently, models for disease gene prediction are dominantly based on molecular expression data and networks, including gene expression, protein expression, co-expression networks, protein-protein interaction networks, etc. One limitation of these methods is that they do not consider the knowledge of annotated gene sets representing known pathways or functionally-related sets of genes. In this study, we propose a new approach to predict disease-associated genes by integrating annotated gene sets data from the Molecular Signature Database (MSigDB). It first represents and integrates the different types of annotated gene sets in the MSigDB database in the form of the signal matrix. It then uses the signal matrix as the gene feature to train the disease gene prediction model. We compare our method with existing methods in predicting genes for five complex diseases. The results show that our method is superior to other methods. Further, we perform a case study on autism spectrum disorder (ASD). We find that ASD predictions are associated with ASD based on the statistical analysis of biological networks and independent ASD studies. The source code, prediction results and datasets are publicly available on https://github.com/genemine/GSI.git.

14.

GTFtools: a software package for analyzing various features of gene models.

Li, Hong-Dong; Lin, Cui-Xiang; Zheng, Jiantao.

Bioinformatics ; 38(20): 4806-4808, 2022 10 14.

Artículo en Inglés | MEDLINE | ID: mdl-36000853

RESUMEN

MOTIVATION: Gene-centric bioinformatics studies frequently involve the calculation or the extraction of various features of genes such as splice sites, promoters, independent introns and untranslated regions (UTRs) through manipulation of gene models. Gene models are often annotated in gene transfer format (GTF) files. The features are essential for subsequent analysis such as intron retention detection, DNA-binding site identification and computing splicing strength of splice sites. Some features such as independent introns and splice sites are not provided in existing resources including the commonly used BioMart database. A package that implements and integrates functions to analyze various features of genes will greatly ease routine analysis for related bioinformatics studies. However, to the best of our knowledge, such a package is not available yet. RESULTS: We introduce GTFtools, a stand-alone command-line software that provides a set of functions to calculate various gene features, including splice sites, independent introns, transcription start sites (TSS)-flanking regions, UTRs, isoform coordination and length, different types of gene lengths, etc. It takes the ENSEMBL or GENCODE GTF files as input and can be applied to both human and non-human gene models like the lab mouse. We compare the utilities of GTFtools with those of two related tools: Bedtools and BioMart. GTFtools is implemented in Python and not dependent on any third-party software, making it very easy to install and use. AVAILABILITY AND IMPLEMENTATION: GTFtools is freely available at www.genemine.org/gtftools.php as well as pyPI and Bioconda.

Asunto(s)

Biología Computacional , Programas Informáticos , ADN , Intrones , Regiones no Traducidas

15.

AlzCode: a platform for multiview analysis of genes related to Alzheimer's disease.

Lin, Cui-Xiang; Li, Hong-Dong; Deng, Chao; Erhardt, Shannon; Wang, Jun; Peng, Xiaoqing; Wang, Jianxin.

Bioinformatics ; 38(7): 2030-2032, 2022 03 28.

Artículo en Inglés | MEDLINE | ID: mdl-35040932

RESUMEN

MOTIVATION: Alzheimer's disease (AD) is a complex brain disorder with risk genes incompletely identified. The candidate genes are dominantly obtained by computational approaches. In order to obtain biological insights of candidate genes or screen genes for experimental testing, it is essential to assess their relevance to AD. A platform that integrates different types of omics data and approaches would facilitate the analysis of candidate genes and is in great need. RESULTS: We report AlzCode, a platform for multiview analysis of genes related to AD. First, this platform integrates a rich collection of functional genomic data, including expression data of AD samples (gene expression, single-cell RNA-seq data and protein expression), AD-specific biological networks (co-expression networks and functional gene networks), neuropathological and clinical traits (CERAD score, Braak staging score, Clinical Dementia Rating, cognitive function and clinical severity) and general data such as protein-protein interaction, regulatory networks, sequence similarity and miRNA-target interactions. These data provide basis for analyzing genes from different views. Second, the platform integrates multiple approaches designed for the various types of data. We implement functions to analyze both individual genes and gene sets. We also compare AlzCode with two existing platforms for AD analysis, which are Agora and AD Atlas. We pinpoint the features of each platform and highlight their differences. This platform would be valuable to the understanding of AD genetics and pathological mechanisms. AVAILABILITY AND IMPLEMENTATION: AlzCode is freely available at: http://www.alzcode.xyz. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/genética , Redes Reguladoras de Genes , Genómica

16.

An integrated brain-specific network identifies genes associated with neuropathologic and clinical traits of Alzheimer's disease.

Lin, Cui-Xiang; Li, Hong-Dong; Deng, Chao; Liu, Weisheng; Erhardt, Shannon; Wu, Fang-Xiang; Zhao, Xing-Ming; Guan, Yuanfang; Wang, Jun; Wang, Daifeng; Hu, Bin; Wang, Jianxin.

Brief Bioinform ; 23(1)2022 01 17.

Artículo en Inglés | MEDLINE | ID: mdl-34953465

RESUMEN

Alzheimer's disease (AD) has a strong genetic predisposition. However, its risk genes remain incompletely identified. We developed an Alzheimer's brain gene network-based approach to predict AD-associated genes by leveraging the functional pattern of known AD-associated genes. Our constructed network outperformed existing networks in predicting AD genes. We then systematically validated the predictions using independent genetic, transcriptomic, proteomic data, neuropathological and clinical data. First, top-ranked genes were enriched in AD-associated pathways. Second, using external gene expression data from the Mount Sinai Brain Bank study, we found that the top-ranked genes were significantly associated with neuropathological and clinical traits, including the Consortium to Establish a Registry for Alzheimer's Disease score, Braak stage score and clinical dementia rating. The analysis of Alzheimer's brain single-cell RNA-seq data revealed cell-type-specific association of predicted genes with early pathology of AD. Third, by interrogating proteomic data in the Religious Orders Study and Memory and Aging Project and Baltimore Longitudinal Study of Aging studies, we observed a significant association of protein expression level with cognitive function and AD clinical severity. The network, method and predictions could become a valuable resource to advance the identification of risk genes for AD.

Asunto(s)

Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Encéfalo/metabolismo , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Envejecimiento/genética , Perfilación de la Expresión Génica , Humanos , Estudios Longitudinales , Memoria , Proteómica , RNA-Seq , Transcriptoma

17.

TissueNexus: a database of human tissue functional gene networks built with a large compendium of curated RNA-seq data.

Lin, Cui-Xiang; Li, Hong-Dong; Deng, Chao; Guan, Yuanfang; Wang, Jianxin.

Nucleic Acids Res ; 50(D1): D710-D718, 2022 01 07.

Artículo en Inglés | MEDLINE | ID: mdl-34850130

RESUMEN

Mapping gene interactions within tissues/cell types plays a crucial role in understanding the genetic basis of human physiology and disease. Tissue functional gene networks (FGNs) are essential models for mapping complex gene interactions. We present TissueNexus, a database of 49 human tissue/cell line FGNs constructed by integrating heterogeneous genomic data. We adopted an advanced machine learning approach for data integration because Bayesian classifiers, which is the main approach used for constructing existing tissue gene networks, cannot capture the interaction and nonlinearity of genomic features well. A total of 1,341 RNA-seq datasets containing 52,087 samples were integrated for all of these networks. Because the tissue label for RNA-seq data may be annotated with different names or be missing, we performed intensive hand-curation to improve quality. We further developed a user-friendly database for network search, visualization, and functional analysis. We illustrate the application of TissueNexus in prioritizing disease genes. The database is publicly available at https://www.diseaselinks.com/TissueNexus/.

Asunto(s)

Bases de Datos Genéticas , Redes Reguladoras de Genes/genética , Especificidad de Órganos/genética , RNA-Seq , Curaduría de Datos , Manejo de Datos , Genoma Humano/genética , Humanos , Programas Informáticos

18.

RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest.

Zhao, Yuan; Fang, Zhao-Yu; Lin, Cui-Xiang; Deng, Chao; Xu, Yun-Pei; Li, Hong-Dong.

Front Genet ; 12: 665843, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34386033

RESUMEN

In recent years, the application of single cell RNA-seq (scRNA-seq) has become more and more popular in fields such as biology and medical research. Analyzing scRNA-seq data can discover complex cell populations and infer single-cell trajectories in cell development. Clustering is one of the most important methods to analyze scRNA-seq data. In this paper, we focus on improving scRNA-seq clustering through gene selection, which also reduces the dimensionality of scRNA-seq data. Studies have shown that gene selection for scRNA-seq data can improve clustering accuracy. Therefore, it is important to select genes with cell type specificity. Gene selection not only helps to reduce the dimensionality of scRNA-seq data, but also can improve cell type identification in combination with clustering methods. Here, we proposed RFCell, a supervised gene selection method, which is based on permutation and random forest classification. We first use RFCell and three existing gene selection methods to select gene sets on 10 scRNA-seq data sets. Then, three classical clustering algorithms are used to cluster the cells obtained by these gene selection methods. We found that the gene selection performance of RFCell was better than other gene selection methods.

19.

REBET: a method to determine the number of cell clusters based on batch effect removal.

Fang, Zhao-Yu; Lin, Cui-Xiang; Xu, Yun-Pei; Li, Hong-Dong; Xu, Qing-Song.

Brief Bioinform ; 22(6)2021 11 05.

Artículo en Inglés | MEDLINE | ID: mdl-34131702

RESUMEN

In single-cell RNA-seq (scRNA-seq) data analysis, a fundamental problem is to determine the number of cell clusters based on the gene expression profiles. However, the performance of current methods is still far from satisfactory, presumably due to their limitations in capturing the expression variability among cell clusters. Batch effects represent the undesired variability between data measured in different batches. When data are obtained from different labs or protocols batch effects occur. Motivated by the practice of batch effect removal, we considered cell clusters as batches. We hypothesized that the number of cell clusters (i.e. batches) could be correctly determined if the variances among clusters (i.e. batch effects) were removed. We developed a new method, namely, removal of batch effect and testing (REBET), for determining the number of cell clusters. In this method, cells are first partitioned into k clusters. Second, the batch effects among these k clusters are then removed. Third, the quality of batch effect removal is evaluated with the average range of normalized mutual information (ARNMI), which measures how uniformly the cells with batch-effects-removal are mixed. By testing a range of k values, the k value that corresponds to the lowest ARNMI is determined to be the optimal number of clusters. We compared REBET with state-of-the-art methods on 32 simulated datasets and 14 published scRNA-seq datasets. The results show that REBET can accurately and robustly estimate the number of cell clusters and outperform existing methods. Contact: H.D.L. (hongdong@csu.edu.cn) or Q.S.X. (qsxu@csu.edu.cn).

Asunto(s)

Análisis por Conglomerados , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Bases de Datos Genéticas , Reproducibilidad de los Resultados

20.

Modulating innate immune activation states impacts the efficacy of specific Aß immunotherapy.

Levites, Yona; Funk, Cory; Wang, Xue; Chakrabarty, Paramita; McFarland, Karen N; Bramblett, Baxter; O'Neal, Veronica; Liu, Xufei; Ladd, Thomas; Robinson, Max; Allen, Mariet; Carrasquillo, Minerva M; Dickson, Dennis; Cruz, Pedro; Ryu, Danny; Li, Hong-Dong; Price, Nathan D; Ertekin-Taner, NIlüfer; Golde, Todd E.

Mol Neurodegener ; 16(1): 32, 2021 05 06.

Artículo en Inglés | MEDLINE | ID: mdl-33957936

RESUMEN

INTRODUCTION: Passive immunotherapies targeting Aß continue to be evaluated as Alzheimer's disease (AD) therapeutics, but there remains debate over the mechanisms by which these immunotherapies work. Besides the amount of preexisting Aß deposition and the type of deposit (compact or diffuse), there is little data concerning what factors, independent of those intrinsic to the antibody, might influence efficacy. Here we (i) explored how constitutive priming of the underlying innate activation states by Il10 and Il6 might influence passive Aß immunotherapy and (ii) evaluated transcriptomic data generated in the AMP-AD initiative to inform how these two cytokines and their receptors' mRNA levels are altered in human AD and an APP mouse model. METHODS: rAAV2/1 encoding EGFP, Il6 or Il10 were delivered by somatic brain transgenesis to neonatal (P0) TgCRND8 APP mice. Then, at 2 months of age, the mice were treated bi-weekly with a high-affinity anti-Aß1-16 mAb5 monoclonal antibody or control mouse IgG until 6 months of age. rAAV mediated transgene expression, amyloid accumulation, Aß levels and gliosis were assessed. Extensive transcriptomic data was used to evaluate the mRNA expression levels of IL10 and IL6 and their receptors in the postmortem human AD temporal cortex and in the brains of TgCRND8 mice, the later at multiple ages. RESULTS: Priming TgCRND8 mice with Il10 increases Aß loads and blocks efficacy of subsequent mAb5 passive immunotherapy, whereas priming with Il6 priming reduces Aß loads by itself and subsequent Aß immunotherapy shows only a slightly additive effect. Transcriptomic data shows that (i) there are significant increases in the mRNA levels of Il6 and Il10 receptors in the TgCRND8 mouse model and temporal cortex of humans with AD and (ii) there is a great deal of variance in individual mouse brain and the human temporal cortex of these interleukins and their receptors. CONCLUSIONS: The underlying immune activation state can markedly affect the efficacy of passive Aß immunotherapy. These results have important implications for ongoing human AD immunotherapy trials, as they indicate that underlying immune activation states within the brain, which may be highly variable, may influence the ability for passive immunotherapy to alter Aß deposition.

Asunto(s)

Enfermedad de Alzheimer/inmunología , Péptidos beta-Amiloides/antagonistas & inhibidores , Anticuerpos Monoclonales/farmacología , Inmunidad Innata/efectos de los fármacos , Inmunización Pasiva/métodos , Animales , Humanos , Interleucina-10/inmunología , Interleucina-6/inmunología , Ratones , Ratones Transgénicos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA