Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Nucleic Acids Res ; 52(D1): D990-D997, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37831073

RESUMEN

Rare variants contribute significantly to the genetic causes of complex traits, as they can have much larger effects than common variants and account for much of the missing heritability in genome-wide association studies. The emergence of UK Biobank scale datasets and accurate gene-level rare variant-trait association testing methods have dramatically increased the number of rare variant associations that have been detected. However, no systematic collection of these associations has been carried out to date, especially at the gene level. To address the issue, we present the Rare Variant Association Repository (RAVAR), a comprehensive collection of rare variant associations. RAVAR includes 95 047 high-quality rare variant associations (76186 gene-level and 18 861 variant-level associations) for 4429 reported traits which are manually curated from 245 publications. RAVAR is the first resource to collect and curate published rare variant associations in an interactive web interface with integrated visualization, search, and download features. Detailed gene and SNP information are provided for each association, and users can conveniently search for related studies by exploring the EFO tree structure and interactive Manhattan plots. RAVAR could vastly improve the accessibility of rare variant studies. RAVAR is freely available for all users without login requirement at http://www.ravar.bio.


Asunto(s)
Bases de Datos Genéticas , Variación Genética , Estudio de Asociación del Genoma Completo , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial , Fenotipo
2.
BMC Genomics ; 25(1): 300, 2024 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-38515040

RESUMEN

BACKGROUND: The Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) utilizes the Transposase Tn5 to probe open chromatic, which simultaneously reveals multiple transcription factor binding sites (TFBSs) compared to traditional technologies. Deep learning (DL) technology, including convolutional neural networks (CNNs), has successfully found motifs from ATAC-seq data. Due to the limitation of the width of convolutional kernels, the existing models only find motifs with fixed lengths. A Graph neural network (GNN) can work on non-Euclidean data, which has the potential to find ATAC-seq motifs with different lengths. However, the existing GNN models ignored the relationships among ATAC-seq sequences, and their parameter settings should be improved. RESULTS: In this study, we proposed a novel GNN model named GNNMF to find ATAC-seq motifs via GNN and background coexisting probability. Our experiment has been conducted on 200 human datasets and 80 mouse datasets, demonstrated that GNNMF has improved the area of eight metrics radar scores of 4.92% and 6.81% respectively, and found more motifs than did the existing models. CONCLUSIONS: In this study, we developed a novel model named GNNMF for finding multiple ATAC-seq motifs. GNNMF built a multi-view heterogeneous graph by using ATAC-seq sequences, and utilized background coexisting probability and the iterloss to find different lengths of ATAC-seq motifs and optimize the parameter sets. Compared to existing models, GNNMF achieved the best performance on TFBS prediction and ATAC-seq motif finding, which demonstrates that our improvement is available for ATAC-seq motif finding.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Animales , Ratones , Análisis de Secuencia de ADN , Cromatina/genética , Redes Neurales de la Computación
3.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33200787

RESUMEN

Simultaneous profiling transcriptomic and chromatin accessibility information in the same individual cells offers an unprecedented resolution to understand cell states. However, computationally effective methods for the integration of these inherent sparse and heterogeneous data are lacking. Here, we present a single-cell multimodal variational autoencoder model, which combines three types of joint-learning strategies with a probabilistic Gaussian Mixture Model to learn the joint latent features that accurately represent these multilayer profiles. Studies on both simulated datasets and real datasets demonstrate that it has more preferable capability (i) dissecting cellular heterogeneity in the joint-learning space, (ii) denoising and imputing data and (iii) constructing the association between multilayer omics data, which can be used for understanding transcriptional regulatory mechanisms.


Asunto(s)
Cromatina/metabolismo , Bases de Datos Factuales , Aprendizaje Profundo , Modelos Biológicos , Análisis de la Célula Individual , Transcriptoma , Cromatina/genética , Humanos , Células K562
4.
Bioinformatics ; 37(22): 4091-4099, 2021 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-34028557

RESUMEN

MOTIVATION: Joint profiling of single-cell transcriptomics and epigenomics data enables us to characterize cell states and transcriptomics regulatory programs related to cellular heterogeneity. However, the highly different features on sparsity, heterogeneity and dimensionality between multi-omics data have severely hindered its integrative analysis. RESULTS: We proposed deep cross-omics cycle attention (DCCA) model, a computational tool for joint analysis of single-cell multi-omics data, by combining variational autoencoders (VAEs) and attention-transfer. Specifically, we show that DCCA can leverage one omics data to fine-tune the network trained for another omics data, given a dataset of parallel multi-omics data within the same cell. Studies on both simulated and real datasets from various platforms, DCCA demonstrates its superior capability: (i) dissecting cellular heterogeneity; (ii) denoising and aggregating data and (iii) constructing the link between multi-omics data, which is used to infer new transcriptional regulatory relations. In our applications, DCCA was demonstrated to have a superior power to generate missing stages or omics in a biologically meaningful manner, which provides a new way to analyze and also understand complicated biological processes. AVAILABILITY AND IMPLEMENTATION: DCCA source code is available at https://github.com/cmzuo11/DCCA, and has been deposited in archived format at https://doi.org/10.5281/zenodo.4762065. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Multiómica , Programas Informáticos , Regulación de la Expresión Génica , Epigenómica
5.
J Hepatol ; 75(5): 1128-1141, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34171432

RESUMEN

BACKGROUND & AIMS: Our previous genomic whole-exome sequencing (WES) data identified the key ErbB pathway mutations that play an essential role in regulating the malignancy of gallbladder cancer (GBC). Herein, we tested the hypothesis that individual cellular components of the tumor microenvironment (TME) in GBC function differentially to participate in ErbB pathway mutation-dependent tumor progression. METHODS: We engaged single-cell RNA-sequencing to reveal transcriptomic heterogeneity and intercellular crosstalk from 13 human GBCs and adjacent normal tissues. In addition, we performed WES analysis to reveal the genomic variations related to tumor malignancy. A variety of bulk RNA-sequencing, immunohistochemical staining, immunofluorescence staining and functional experiments were employed to study the difference between tissues with or without ErbB pathway mutations. RESULTS: We identified 16 cell types from a total of 114,927 cells, in which epithelial cells, M2 macrophages, and regulatory T cells were predominant in tumors with ErbB pathway mutations. Furthermore, epithelial cell subtype 1, 2 and 3 were mainly found in adenocarcinoma and subtype 4 was present in adenosquamous carcinoma. The tumors with ErbB pathway mutations harbored larger populations of epithelial cell subtype 1 and 2, and expressed higher levels of secreted midkine (MDK) than tumors without ErbB pathway mutations. Increased MDK resulted in an interaction with its receptor LRP1, which is expressed by tumor-infiltrating macrophages, and promoted immunosuppressive macrophage differentiation. Moreover, the crosstalk between macrophage-secreted CXCL10 and its receptor CXCR3 on regulatory T cells was induced in GBC with ErbB pathway mutations. Elevated MDK was correlated with poor overall survival in patients with GBC. CONCLUSIONS: This study has provided valuable insights into transcriptomic heterogeneity and the global cellular network in the TME, which coordinately functions to promote the progression of GBC with ErbB pathway mutations; thus, unveiling novel cellular and molecular targets for cancer therapy. LAY SUMMARY: We employed single-cell RNA-sequencing and functional assays to uncover the transcriptomic heterogeneity and intercellular crosstalk present in gallbladder cancer. We found that ErbB pathway mutations reduced anti-cancer immunity and led to cancer development. ErbB pathway mutations resulted in immunosuppressive macrophage differentiation and regulatory T cell activation, explaining the reduced anti-cancer immunity and worse overall survival observed in patients with these mutations.


Asunto(s)
Receptores ErbB/inmunología , Neoplasias de la Vesícula Biliar/inmunología , Huésped Inmunocomprometido/fisiología , Midkina/efectos adversos , Proliferación Celular/genética , China/epidemiología , Receptores ErbB/antagonistas & inhibidores , Neoplasias de la Vesícula Biliar/epidemiología , Neoplasias de la Vesícula Biliar/fisiopatología , Humanos , Midkina/genética , Análisis de Secuencia de ARN/métodos , Análisis de Secuencia de ARN/estadística & datos numéricos , Transducción de Señal/genética , Análisis de la Célula Individual/métodos , Análisis de la Célula Individual/estadística & datos numéricos , Secuenciación del Exoma/métodos , Secuenciación del Exoma/estadística & datos numéricos
6.
Nat Commun ; 15(1): 5057, 2024 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-38871687

RESUMEN

Spatially resolved transcriptomics (SRT) has enabled precise dissection of tumor-microenvironment (TME) by analyzing its intracellular molecular networks and intercellular cell-cell communication (CCC). However, lacking computational exploration of complicated relations between cells, genes, and histological regions, severely limits the ability to interpret the complex structure of TME. Here, we introduce stKeep, a heterogeneous graph (HG) learning method that integrates multimodality and gene-gene interactions, in unraveling TME from SRT data. stKeep leverages HG to learn both cell-modules and gene-modules by incorporating features of diverse nodes including genes, cells, and histological regions, allows for identifying finer cell-states within TME and cell-state-specific gene-gene relations, respectively. Furthermore, stKeep employs HG to infer CCC for each cell, while ensuring that learned CCC patterns are comparable across different cell-states through contrastive learning. In various cancer samples, stKeep outperforms other tools in dissecting TME such as detecting bi-potent basal populations, neoplastic myoepithelial cells, and metastatic cells distributed within the tumor or leading-edge regions. Notably, stKeep identifies key transcription factors, ligands, and receptors relevant to disease progression, which are further validated by the functional and survival analysis of independent clinical data, thereby highlighting its clinical prognostic and immunotherapy applications.


Asunto(s)
Neoplasias , Transcriptoma , Microambiente Tumoral , Microambiente Tumoral/genética , Humanos , Neoplasias/genética , Neoplasias/patología , Regulación Neoplásica de la Expresión Génica , Perfilación de la Expresión Génica/métodos , Comunicación Celular/genética , Biología Computacional/métodos , Redes Reguladoras de Genes , Aprendizaje Automático
7.
Cancer Lett ; 586: 216675, 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38280478

RESUMEN

Gallbladder cancer (GBC) is among the most common malignancies of biliary tract system due to its limited treatments. The immunotherapeutic targets for T cells are appealing, however, heterogeneity of T cells hinds its further development. We systematically construct T cell atlas by single-cell RNA sequencing; and utilized the identified gene signatures of high_CNV_T cells to predict molecular subtyping towards personalized therapeutic treatments for GBC. We identified 12 T cell subtypes, where exhausted CD8+ T cells, activated/exhausted CD8+ T cells, and regulatory T cells were predominant in tumors. There appeared to be an inverse relationship between Th17 and Treg populations with Th17 levels significantly reduced, whereas Tregs were concomitantly increased. Furthermore, we first established subtyping criterion to identify three subtypes of GBC based on their pro-tumorigenic microenvironments, e.g., the type 1 group shows more M2 macrophages infiltration, while the type 2 group is infiltrated by highly exhausted CD8+ T cells, B cells and Tregs with suppressive activities. Our study provides valuable insights into T cell heterogeneity and suggests that molecular subtyping based on T cells might provide a potential immunotherapeutic strategy to improve GBC treatment.


Asunto(s)
Linfocitos T CD8-positivos , Neoplasias de la Vesícula Biliar , Humanos , Linfocitos T CD8-positivos/metabolismo , Neoplasias de la Vesícula Biliar/genética , Neoplasias de la Vesícula Biliar/terapia , Neoplasias de la Vesícula Biliar/metabolismo , Linfocitos T Reguladores/patología , Inmunoterapia , Macrófagos/patología , Microambiente Tumoral
8.
Comput Med Imaging Graph ; 98: 102057, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35561640

RESUMEN

Brain networks constructed with regions of interest (ROIs) from the structural magnetic resonance imaging (sMRI) image are widely investigated for detecting Alzheimer's disease (AD). However, the ROI is generally represented by spatial domain-based features, so attentions are hardly paid to constructing a brain network with the frequency domain-based feature. In order to accurately characterize the ROI in the frequency domain and then construct an individual network, in this study, a novel method, which can describe the ROI properly by directional subbands and capture correlations between those ROIs, is proposed to construct a shearlet subband energy feature-based individual network (SSBIN) for AD detection. Specifically, the SSBIN is constructed with 90 ROIs which are segmented from the pre-processed sMRI image based on the automated anatomical labeling atlas, the 90 ROIs are represented by directional subband-based energy feature vectors (SVs) formed by jointing energy features extracted from their directional subbands, and the weight values of the SSBIN are computed by Pearson's correlation coefficient (PCC). Subsequently, two network features are extracted from the SSBIN: the node feature vector (NV) is computed by averaging the 90 SVs; the low dimensional edge feature vector (LV) is obtained by kernel principal component analysis (KPCA). Following that the concatenation of NV and LV is used as a SSBIN-based feature for the sMRI image. Finally, we use support vector machine (SVM) with the radial basis function kernel as classifier to categorize 680 subjects selected from the AD Neuroimaging Initiative (ADNI) database. Experimental results validate that the ROI can be properly characterized by the NV, and correlations between ROIs captured by the LV play an important role in AD detection. Besides, a series of comparisons with four current state-of-the-art approaches demonstrate the higher AD detecting performance of the SSBIN method.


Asunto(s)
Enfermedad de Alzheimer , Enfermedad de Alzheimer/diagnóstico por imagen , Encéfalo/diagnóstico por imagen , Humanos , Imagen por Resonancia Magnética/métodos , Neuroimagen , Máquina de Vectores de Soporte
9.
Comput Struct Biotechnol J ; 20: 3556-3566, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35860411

RESUMEN

We developed a new computational method, Single-Cell Entropy Network (SCEN) to analyze single-cell RNA-seq data, which used the information of gene-gene associations to discover new heterogeneity of immune cells as well as identify existing cell types. Based on SCEN, we defined association-entropy (AE) for each cell and each gene through single-cell gene co-expression networks to measure the strength of association between each gene and all other genes at a single-cell resolution. Analyses of public datasets indicated that the AE of ribosomal protein genes (RP genes) varied greatly even in the same cell type of immune cells and the average AE of RP genes of immune cells in each person was significantly associated with the healthy/disease state of this person. Based on existing research and theory, we inferred that the AE of RP genes represented the heterogeneity of ribosomes and reflected the activity of immune cells. We believe SCEN can provide more biological insights into the heterogeneity and diversity of immune cells, especially the change of immune cells in the diseases.

10.
Nat Commun ; 13(1): 5962, 2022 10 10.
Artículo en Inglés | MEDLINE | ID: mdl-36216831

RESUMEN

Spatially resolved transcriptomics (SRT) technology enables us to gain novel insights into tissue architecture and cell development, especially in tumors. However, lacking computational exploitation of biological contexts and multi-view features severely hinders the elucidation of tissue heterogeneity. Here, we propose stMVC, a multi-view graph collaborative-learning model that integrates histology, gene expression, spatial location, and biological contexts in analyzing SRT data by attention. Specifically, stMVC adopting semi-supervised graph attention autoencoder separately learns view-specific representations of histological-similarity-graph or spatial-location-graph, and then simultaneously integrates two-view graphs for robust representations through attention under semi-supervision of biological contexts. stMVC outperforms other tools in detecting tissue structure, inferring trajectory relationships, and denoising on benchmark slices of human cortex. Particularly, stMVC identifies disease-related cell-states and their transition cell-states in breast cancer study, which are further validated by the functional and survival analysis of independent clinical data. Those results demonstrate clinical and prognostic applications from SRT data.


Asunto(s)
Neoplasias de la Mama , Prácticas Interdisciplinarias , Neoplasias de la Mama/genética , Femenino , Humanos , Transcriptoma/genética
11.
Genes (Basel) ; 12(12)2021 11 23.
Artículo en Inglés | MEDLINE | ID: mdl-34946794

RESUMEN

Rapid advances in single-cell genomics sequencing (SCGS) have allowed researchers to characterize tumor heterozygosity with unprecedented resolution and reveal the phylogenetic relationships between tumor cells or clones. However, high sequencing error rates of current SCGS data, i.e., false positives, false negatives, and missing bases, severely limit its application. Here, we present a deep learning framework, RDAClone, to recover genotype matrices from noisy data with an extended robust deep autoencoder, cluster cells into subclones by the Louvain-Jaccard method, and further infer evolutionary relationships between subclones by the minimum spanning tree. Studies on both simulated and real datasets demonstrate its robustness and superiority in data denoising, cell clustering, and evolutionary tree reconstruction, particularly for large datasets.


Asunto(s)
Genómica/métodos , Neoplasias/genética , Análisis de la Célula Individual/métodos , Algoritmos , Evolución Biológica , Análisis por Conglomerados , Análisis de Datos , Aprendizaje Profundo , Filogenia
12.
Genes (Basel) ; 10(3)2019 03 07.
Artículo en Inglés | MEDLINE | ID: mdl-30866472

RESUMEN

It is very significant to explore the intrinsic differences in breast cancer subtypes. These intrinsic differences are closely related to clinical diagnosis and designation of treatment plans. With the accumulation of biological and medicine datasets, there are many different omics data that can be viewed in different aspects. Combining these multiple omics data can improve the accuracy of prediction. Meanwhile; there are also many different databases available for us to download different types of omics data. In this article, we use estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2) to define breast cancer subtypes and classify any two breast cancer subtypes using SMO-MKL algorithm. We collected mRNA data, methylation data and copy number variation (CNV) data from TCGA to classify breast cancer subtypes. Multiple Kernel Learning (MKL) is employed to use these omics data distinctly. The result of using three omics data with multiple kernels is better than that of using single omics data with multiple kernels. Furthermore; these significant genes and pathways discovered in the feature selection process are also analyzed. In experiments; the proposed method outperforms other state-of-the-art methods and has abundant biological interpretations.


Asunto(s)
Genómica/métodos , Aprendizaje Automático , Neoplasias de la Mama Triple Negativas/genética , Variaciones en el Número de Copia de ADN , Metilación de ADN , Femenino , Humanos , ARN Mensajero/genética , ARN Mensajero/metabolismo , Receptor ErbB-2/genética , Receptor ErbB-2/metabolismo , Receptores de Estrógenos/genética , Receptores de Estrógenos/metabolismo , Receptores de Progesterona/genética , Receptores de Progesterona/metabolismo , Neoplasias de la Mama Triple Negativas/clasificación
13.
PLoS One ; 13(9): e0204426, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30248119

RESUMEN

Switchgrass is an important bioenergy crop typically grown in marginal lands, where the plants must often deal with abiotic stresses such as drought and salt. Alamo is known to be more tolerant to both stress types than Dacotah, two ecotypes of switchgrass. Understanding of their stress response and adaptation programs can have important implications to engineering more stress tolerant plants. We present here a computational study by analyzing time-course transcriptomic data of the two ecotypes to elucidate and compare their regulatory systems in response to drought and salt stresses. A total of 1,693 genes (target genes or TGs) are found to be differentially expressed and possibly regulated by 143 transcription factors (TFs) in response to drought stress together in the two ecotypes. Similarly, 1,535 TGs regulated by 110 TFs are identified to be involved in response to salt stress. Two regulatory networks are constructed to predict their regulatory relationships. In addition, a time-dependent hidden Markov model is derived for each ecotype responding to each stress type, to provide a dynamic view of how each regulatory network changes its behavior over time. A few new insights about the response mechanisms are predicted from the regulatory networks and the time-dependent models. Comparative analyses between the network models of the two ecotypes reveal key commonalities and main differences between the two regulatory systems. Overall, our results provide new information about the complex regulatory mechanisms of switchgrass responding to drought and salt stresses.


Asunto(s)
Sequías , Regulación de la Expresión Génica de las Plantas , Panicum/metabolismo , Estrés Salino/fisiología , Empalme Alternativo , Simulación por Computador , Ecotipo , Regulación de la Expresión Génica de las Plantas/fisiología , Cadenas de Markov , Panicum/genética , Especificidad de la Especie , Transcriptoma
14.
Biotechnol Biofuels ; 11: 170, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29951114

RESUMEN

BACKGROUND: Switchgrass (Panicum virgatum L.) is an important bioenergy crop widely used for lignocellulosic research. While extensive transcriptomic analyses have been conducted on this species using short read-based sequencing techniques, very little has been reliably derived regarding alternatively spliced (AS) transcripts. RESULTS: We present an analysis of transcriptomes of six switchgrass tissue types pooled together, sequenced using Pacific Biosciences (PacBio) single-molecular long-read technology. Our analysis identified 105,419 unique transcripts covering 43,570 known genes and 8795 previously unknown genes. 45,168 are novel transcripts of known genes. A total of 60,096 AS transcripts are identified, 45,628 being novel. We have also predicted 1549 transcripts of genes involved in cell wall construction and remodeling, 639 being novel transcripts of known cell wall genes. Most of the predicted transcripts are validated against Illumina-based short reads. Specifically, 96% of the splice junction sites in all the unique transcripts are validated by at least five Illumina reads. Comparisons between genes derived from our identified transcripts and the current genome annotation revealed that among the gene set predicted by both analyses, 16,640 have different exon-intron structures. CONCLUSIONS: Overall, substantial amount of new information is derived from the PacBio RNA data regarding both the transcriptome and the genome of switchgrass.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA