Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 122
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Genome Res ; 33(10): 1788-1805, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37827697

RESUMEN

Cell-cell communication (CCC) is critical for determining cell fates and functions in multicellular organisms. With the advent of single-cell RNA-sequencing (scRNA-seq) and spatial transcriptomics (ST), an increasing number of CCC inference methods have been developed. Nevertheless, a thorough comparison of their performances is yet to be conducted. To fill this gap, we developed a systematic benchmark framework called ESICCC to evaluate 18 ligand-receptor (LR) inference methods and five ligand/receptor-target inference methods using a total of 116 data sets, including 15 ST data sets, 15 sets of cell line perturbation data, two sets of cell type-specific expression/proteomics data, and 84 sets of sampled or unsampled scRNA-seq data. We evaluated and compared the agreement, accuracy, robustness, and usability of these methods. Regarding accuracy evaluation, RNAMagnet, CellChat, and scSeqComm emerge as the three best-performing methods for intercellular ligand-receptor inference based on scRNA-seq data, whereas stMLnet and HoloNet are the best methods for predicting ligand/receptor-target regulation using ST data. To facilitate the practical applications, we provide a decision-tree-style guideline for users to easily choose best tools for their specific research concerns in CCC inference, and develop an ensemble pipeline CCCbank that enables versatile combinations of methods and databases. Moreover, our comparative results also uncover several critical influential factors for CCC inference, such as prior interaction information, ligand-receptor scoring algorithm, intracellular signaling complexity, and spatial relationship, which may be considered in the future studies to advance the development of new methodologies.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Ligandos , Análisis de la Célula Individual/métodos , Algoritmos , Comunicación Celular/genética , Análisis de Secuencia de ARN/métodos
2.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38366803

RESUMEN

The evolution in single-cell RNA sequencing (scRNA-seq) technology has opened a new avenue for researchers to inspect cellular heterogeneity with single-cell precision. One crucial aspect of this technology is cell-type annotation, which is fundamental for any subsequent analysis in single-cell data mining. Recently, the scientific community has seen a surge in the development of automatic annotation methods aimed at this task. However, these methods generally operate at a steady-state total cell-type capacity, significantly restricting the cell annotation systems'capacity for continuous knowledge acquisition. Furthermore, creating a unified scRNA-seq annotation system remains challenged by the need to progressively expand its understanding of ever-increasing cell-type concepts derived from a continuous data stream. In response to these challenges, this paper presents a novel and challenging setting for annotation, namely cell-type incremental annotation. This concept is designed to perpetually enhance cell-type knowledge, gleaned from continuously incoming data. This task encounters difficulty with data stream samples that can only be observed once, leading to catastrophic forgetting. To address this problem, we introduce our breakthrough methodology termed scEVOLVE, an incremental annotation method. This innovative approach is built upon the methodology of contrastive sample replay combined with the fundamental principle of partition confidence maximization. Specifically, we initially retain and replay sections of the old data in each subsequent training phase, then establish a unique prototypical learning objective to mitigate the cell-type imbalance problem, as an alternative to using cross-entropy. To effectively emulate a model that trains concurrently with complete data, we introduce a cell-type decorrelation strategy that efficiently scatters feature representations of each cell type uniformly. We constructed the scEVOLVE framework with simplicity and ease of integration into most deep softmax-based single-cell annotation methods. Thorough experiments conducted on a range of meticulously constructed benchmarks consistently prove that our methodology can incrementally learn numerous cell types over an extended period, outperforming other strategies that fail quickly. As far as our knowledge extends, this is the first attempt to propose and formulate an end-to-end algorithm framework to address this new, practical task. Additionally, scEVOLVE, coded in Python using the Pytorch machine-learning library, is freely accessible at https://github.com/aimeeyaoyao/scEVOLVE.


Asunto(s)
Algoritmos , Análisis de Expresión Génica de una Sola Célula , Benchmarking , Entropía , Biblioteca de Genes , Análisis de Secuencia de ARN , Perfilación de la Expresión Génica , Análisis por Conglomerados
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38678389

RESUMEN

MOTIVATION: Over the past decade, single-cell transcriptomic technologies have experienced remarkable advancements, enabling the simultaneous profiling of gene expressions across thousands of individual cells. Cell type identification plays an essential role in exploring tissue heterogeneity and characterizing cell state differences. With more and more well-annotated reference data becoming available, massive automatic identification methods have sprung up to simplify the annotation process on unlabeled target data by transferring the cell type knowledge. However, in practice, the target data often include some novel cell types that are not in the reference data. Most existing works usually classify these private cells as one generic 'unassigned' group and learn the features of known and novel cell types in a coupled way. They are susceptible to the potential batch effects and fail to explore the fine-grained semantic knowledge of novel cell types, thus hurting the model's discrimination ability. Additionally, emerging spatial transcriptomic technologies, such as in situ hybridization, sequencing and multiplexed imaging, present a novel challenge to current cell type identification strategies that predominantly neglect spatial organization. Consequently, it is imperative to develop a versatile method that can proficiently annotate single-cell transcriptomics data, encompassing both spatial and non-spatial dimensions. RESULTS: To address these issues, we propose a new, challenging yet realistic task called universal cell type identification for single-cell and spatial transcriptomics data. In this task, we aim to give semantic labels to target cells from known cell types and cluster labels to those from novel ones. To tackle this problem, instead of designing a suboptimal two-stage approach, we propose an end-to-end algorithm called scBOL from the perspective of Bipartite prototype alignment. Firstly, we identify the mutual nearest clusters in reference and target data as their potential common cell types. On this basis, we mine the cycle-consistent semantic anchor cells to build the intrinsic structure association between two data. Secondly, we design a neighbor-aware prototypical learning paradigm to strengthen the inter-cluster separability and intra-cluster compactness within each data, thereby inspiring the discriminative feature representations. Thirdly, driven by the semantic-aware prototypical learning framework, we can align the known cell types and separate the private cell types from them among reference and target data. Such an algorithm can be seamlessly applied to various data types modeled by different foundation models that can generate the embedding features for cells. Specifically, for non-spatial single-cell transcriptomics data, we use the autoencoder neural network to learn latent low-dimensional cell representations, and for spatial single-cell transcriptomics data, we apply the graph convolution network to capture molecular and spatial similarities of cells jointly. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scBOL over various state-of-the-art cell type identification methods. To our knowledge, we are the pioneers in presenting this pragmatic annotation task, as well as in devising a comprehensive algorithmic framework aimed at resolving this challenge across varied types of single-cell data. Finally, scBOL is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scBOL.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Transcriptoma , Análisis de la Célula Individual/métodos , Humanos , Perfilación de la Expresión Génica/métodos , Algoritmos , Biología Computacional/métodos , Programas Informáticos
4.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38935069

RESUMEN

MOTIVATION: In the past decade, single-cell RNA sequencing (scRNA-seq) has emerged as a pivotal method for transcriptomic profiling in biomedical research. Precise cell-type identification is crucial for subsequent analysis of single-cell data. And the integration and refinement of annotated data are essential for building comprehensive databases. However, prevailing annotation techniques often overlook the hierarchical organization of cell types, resulting in inconsistent annotations. Meanwhile, most existing integration approaches fail to integrate datasets with different annotation depths and none of them can enhance the labels of outdated data with lower annotation resolutions using more intricately annotated datasets or novel biological findings. RESULTS: Here, we introduce scPLAN, a hierarchical computational framework designed for scRNA-seq data analysis. scPLAN excels in annotating unlabeled scRNA-seq data using a reference dataset structured along a hierarchical cell-type tree. It identifies potential novel cell types in a systematic, layer-by-layer manner. Additionally, scPLAN effectively integrates annotated scRNA-seq datasets with varying levels of annotation depth, ensuring consistent refinement of cell-type labels across datasets with lower resolutions. Through extensive annotation and novel cell detection experiments, scPLAN has demonstrated its efficacy. Two case studies have been conducted to showcase how scPLAN integrates datasets with diverse cell-type label resolutions and refine their cell-type labels. AVAILABILITY: https://github.com/michaelGuo1204/scPLAN.


Asunto(s)
Biología Computacional , Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Humanos , Programas Informáticos , Transcriptoma , Análisis de Secuencia de ARN/métodos , RNA-Seq/métodos , Anotación de Secuencia Molecular/métodos
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38388681

RESUMEN

MOTIVATION: Cell-type annotation of single-cell RNA-sequencing (scRNA-seq) data is a hallmark of biomedical research and clinical application. Current annotation tools usually assume the simultaneous acquisition of well-annotated data, but without the ability to expand knowledge from new data. Yet, such tools are inconsistent with the continuous emergence of scRNA-seq data, calling for a continuous cell-type annotation model. In addition, by their powerful ability of information integration and model interpretability, transformer-based pre-trained language models have led to breakthroughs in single-cell biology research. Therefore, the systematic combining of continual learning and pre-trained language models for cell-type annotation tasks is inevitable. RESULTS: We herein propose a universal cell-type annotation tool, called CANAL, that continuously fine-tunes a pre-trained language model trained on a large amount of unlabeled scRNA-seq data, as new well-labeled data emerges. CANAL essentially alleviates the dilemma of catastrophic forgetting, both in terms of model inputs and outputs. For model inputs, we introduce an experience replay schema that repeatedly reviews previous vital examples in current training stages. This is achieved through a dynamic example bank with a fixed buffer size. The example bank is class-balanced and proficient in retaining cell-type-specific information, particularly facilitating the consolidation of patterns associated with rare cell types. For model outputs, we utilize representation knowledge distillation to regularize the divergence between previous and current models, resulting in the preservation of knowledge learned from past training stages. Moreover, our universal annotation framework considers the inclusion of new cell types throughout the fine-tuning and testing stages. We can continuously expand the cell-type annotation library by absorbing new cell types from newly arrived, well-annotated training datasets, as well as automatically identify novel cells in unlabeled datasets. Comprehensive experiments with data streams under various biological scenarios demonstrate the versatility and high model interpretability of CANAL. AVAILABILITY: An implementation of CANAL is available from https://github.com/aster-ww/CANAL-torch. CONTACT: dengmh@pku.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Journal Name online.


Asunto(s)
Perfilación de la Expresión Génica , Programas Informáticos , Perfilación de la Expresión Génica/métodos , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual/métodos , Lenguaje , Análisis de Secuencia de ARN/métodos
6.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38279647

RESUMEN

MOTIVATION: The rapid development of spatial transcriptome technologies has enabled researchers to acquire single-cell-level spatial data at an affordable price. However, computational analysis tools, such as annotation tools, tailored for these data are still lacking. Recently, many computational frameworks have emerged to integrate single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics datasets. While some frameworks can utilize well-annotated scRNA-seq data to annotate spatial expression patterns, they overlook critical aspects. First, existing tools do not explicitly consider cell type mapping when aligning the two modalities. Second, current frameworks lack the capability to detect novel cells, which remains a key interest for biologists. RESULTS: To address these problems, we propose an annotation method for spatial transcriptome data called SPANN. The main tasks of SPANN are to transfer cell-type labels from well-annotated scRNA-seq data to newly generated single-cell resolution spatial transcriptome data and discover novel cells from spatial data. The major innovations of SPANN come from two aspects: SPANN automatically detects novel cells from unseen cell types while maintaining high annotation accuracy over known cell types. SPANN finds a mapping between spatial transcriptome samples and RNA data prototypes and thus conducts cell-type-level alignment. Comprehensive experiments using datasets from various spatial platforms demonstrate SPANN's capabilities in annotating known cell types and discovering novel cell states within complex tissue contexts. AVAILABILITY: The source code of SPANN can be accessed at https://github.com/ddb-qiwang/SPANN-torch. CONTACT: dengmh@math.pku.edu.cn.


Asunto(s)
Análisis de Expresión Génica de una Sola Célula , Transcriptoma , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Programas Informáticos
7.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36869836

RESUMEN

The rapid development of single-cell RNA sequencing (scRNA-seq) technology allows us to study gene expression heterogeneity at the cellular level. Cell annotation is the basis for subsequent downstream analysis in single-cell data mining. As more and more well-annotated scRNA-seq reference data become available, many automatic annotation methods have sprung up in order to simplify the cell annotation process on unlabeled target data. However, existing methods rarely explore the fine-grained semantic knowledge of novel cell types absent from the reference data, and they are usually susceptible to batch effects on the classification of seen cell types. Taking into consideration the limitations above, this paper proposes a new and practical task called generalized cell type annotation and discovery for scRNA-seq data whereby target cells are labeled with either seen cell types or cluster labels, instead of a unified 'unassigned' label. To accomplish this, we carefully design a comprehensive evaluation benchmark and propose a novel end-to-end algorithmic framework called scGAD. Specifically, scGAD first builds the intrinsic correspondences on seen and novel cell types by retrieving geometrically and semantically mutual nearest neighbors as anchor pairs. Together with the similarity affinity score, a soft anchor-based self-supervised learning module is then designed to transfer the known label information from reference data to target data and aggregate the new semantic knowledge within target data in the prediction space. To enhance the inter-type separation and intra-type compactness, we further propose a confidential prototype self-supervised learning paradigm to implicitly capture the global topological structure of cells in the embedding space. Such a bidirectional dual alignment mechanism between embedding space and prediction space can better handle batch effect and cell type shift. Extensive results on massive simulation datasets and real datasets demonstrate the superiority of scGAD over various state-of-the-art clustering and annotation methods. We also implement marker gene identification to validate the effectiveness of scGAD in clustering novel cell types and their biological significance. To the best of our knowledge, we are the first to introduce this new and practical task and propose an end-to-end algorithmic framework to solve it. Our method scGAD is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scGAD.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual/métodos , Simulación por Computador , Análisis por Conglomerados , Análisis de Secuencia de ARN/métodos
8.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35021184

RESUMEN

With the increasing volume of human sequencing data available, analysis incorporating external controls becomes a popular and cost-effective approach to boost statistical power in disease association studies. To prevent spurious association due to population stratification, it is important to match the ancestry backgrounds of cases and controls. However, rare variant association tests based on a standard logistic regression model are conservative when all ancestry-matched strata have the same case-control ratio and might become anti-conservative when case-control ratio varies across strata. Under the conditional logistic regression (CLR) model, we propose a weighted burden test (CLR-Burden), a variance component test (CLR-SKAT) and a hybrid test (CLR-MiST). We show that the CLR model coupled with ancestry matching is a general approach to control for population stratification, regardless of the spatial distribution of disease risks. Through extensive simulation studies, we demonstrate that the CLR-based tests robustly control type 1 errors under different matching schemes and are more powerful than the standard Burden, SKAT and MiST tests. Furthermore, because CLR-based tests allow for different case-control ratios across strata, a full-matching scheme can be employed to efficiently utilize all available cases and controls to accelerate the discovery of disease associated genes.


Asunto(s)
Modelos Genéticos , Estudios de Casos y Controles , Simulación por Computador , Humanos , Modelos Logísticos
9.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36383167

RESUMEN

MOTIVATION: Single-cell multi-omics sequencing techniques have rapidly developed in the past few years. Clustering analysis with single-cell multi-omics data may give us novel perspectives to dissect cellular heterogeneity. However, multi-omics data have the properties of inherited large dimension, high sparsity and existence of doublets. Moreover, representations of different omics from even the same cell follow diverse distributions. Without proper distribution alignment techniques, clustering methods will encounter less separable clusters easily affected by less informative omics data. RESULTS: We developed MoClust, a novel joint clustering framework that can be applied to several types of single-cell multi-omics data. A selective automatic doublet detection module that can identify and filter out doublets is introduced in the pretraining stage to improve data quality. Omics-specific autoencoders are introduced to characterize the multi-omics data. A contrastive learning way of distribution alignment is adopted to adaptively fuse omics representations into an omics-invariant representation. This novel way of alignment boosts the compactness and separableness of clusters, while accurately weighting the contribution of each omics to the clustering object. Extensive experiments, over both simulated and real multi-omics datasets, demonstrated the powerful alignment, doublet detection and clustering ability features of MoClust. AVAILABILITY AND IMPLEMENTATION: An implementation of MoClust is available from https://doi.org/10.5281/zenodo.7306504. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Exactitud de los Datos , Multiómica , Análisis por Conglomerados
10.
Bioinformatics ; 39(7)2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37369035

RESUMEN

MOTIVATION: In recent years, high-throughput sequencing technologies have made large-scale protein sequences accessible. However, their functional annotations usually rely on low-throughput and pricey experimental studies. Computational prediction models offer a promising alternative to accelerate this process. Graph neural networks have shown significant progress in protein research, but capturing long-distance structural correlations and identifying key residues in protein graphs remains challenging. RESULTS: In the present study, we propose a novel deep learning model named Hierarchical graph transformEr with contrAstive Learning (HEAL) for protein function prediction. The core feature of HEAL is its ability to capture structural semantics using a hierarchical graph Transformer, which introduces a range of super-nodes mimicking functional motifs to interact with nodes in the protein graph. These semantic-aware super-node embeddings are then aggregated with varying emphasis to produce a graph representation. To optimize the network, we utilized graph contrastive learning as a regularization technique to maximize the similarity between different views of the graph representation. Evaluation of the PDBch test set shows that HEAL-PDB, trained on fewer data, achieves comparable performance to the recent state-of-the-art methods, such as DeepFRI. Moreover, HEAL, with the added benefit of unresolved protein structures predicted by AlphaFold2, outperforms DeepFRI by a significant margin on Fmax, AUPR, and Smin metrics on PDBch test set. Additionally, when there are no experimentally resolved structures available for the proteins of interest, HEAL can still achieve better performance on AFch test set than DeepFRI and DeepGOPlus by taking advantage of AlphaFold2 predicted structures. Finally, HEAL is capable of finding functional sites through class activation mapping. AVAILABILITY AND IMPLEMENTATION: Implementations of our HEAL can be found at https://github.com/ZhonghuiGu/HEAL.


Asunto(s)
Benchmarking , Secuenciación de Nucleótidos de Alto Rendimiento , Secuencia de Aminoácidos , Redes Neurales de la Computación , Semántica
11.
Anal Chem ; 95(48): 17750-17758, 2023 12 05.
Artículo en Inglés | MEDLINE | ID: mdl-37971943

RESUMEN

A new type of carbon dot (CD)-functionalized solution-gated graphene transistor (SGGT) sensor was designed and fabricated for the highly sensitive and highly selective detection of glutathione (GSH). The CDs were synthesized via a one-step hydrothermal method using DL-thioctic acid and triethylenetetramine (TETA) as sources of S, N, and C. The CDs have abundant amino and carboxyl groups and were used to modify the surface of the gate electrode of SGGT as probes for detecting GSH. Remarkably, the CDs-SGGT sensor exhibited excellent selectivity and ultrahigh sensitivity to GSH, with an ultralow limit of detection (LOD) of up to 10-19 M. To the best of our knowledge, the sensor outperforms previously reported systems. Moreover, the CDs-SGGT sensor shows rapid detection and good stability. More importantly, the detection of GSH in artificial serum samples was successfully demonstrated.


Asunto(s)
Grafito , Puntos Cuánticos , Carbono , Límite de Detección , Glutatión
12.
Bioinformatics ; 38(6): 1575-1583, 2022 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-34999761

RESUMEN

MOTIVATION: The rapid development of single-cell RNA sequencing (scRNA-seq) makes it possible to study the heterogeneity of individual cell characteristics. Cell clustering is a vital procedure in scRNA-seq analysis, providing insight into complex biological phenomena. However, the noisy, high-dimensional and large-scale nature of scRNA-seq data introduces challenges in clustering analysis. Up to now, many deep learning-based methods have emerged to learn underlying feature representations while clustering. However, these methods are inefficient when it comes to rare cell type identification and barely able to fully utilize gene dependencies or cell similarity integrally. As a result, they cannot detect a clear cell type structure which is required for clustering accuracy as well as downstream analysis. RESULTS: Here, we propose a novel scRNA-seq clustering algorithm called scNAME which incorporates a mask estimation task for gene pertinence mining and a neighborhood contrastive learning framework for cell intrinsic structure exploitation. The learned pattern through mask estimation helps reveal uncorrupted data structure and denoise the original single-cell data. In addition, the randomly created augmented data introduced in contrastive learning not only helps improve robustness of clustering, but also increases sample size in each cluster for better data capacity. Beyond this, we also introduce a neighborhood contrastive paradigm with an offline memory bank, global in scope, which can inspire discriminative feature representation and achieve intra-cluster compactness, yet inter-cluster separation. The combination of mask estimation task, neighborhood contrastive learning and global memory bank designed in scNAME is conductive to rare cell type detection. The experimental results of both simulations and real data confirm that our method is accurate, robust and scalable. We also implement biological analysis, including marker gene identification, gene ontology and pathway enrichment analysis, to validate the biological significance of our method. To the best of our knowledge, we are among the first to introduce a gene relationship exploration strategy, as well as a global cellular similarity repository, in the single-cell field. AVAILABILITY AND IMPLEMENTATION: An implementation of scNAME is available from https://github.com/aster-ww/scNAME. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados
13.
Bioinformatics ; 38(3): 738-745, 2022 01 12.
Artículo en Inglés | MEDLINE | ID: mdl-34623390

RESUMEN

MOTIVATION: Single-cell RNA-seq (scRNA-seq) has been widely used to resolve cellular heterogeneity. After collecting scRNA-seq data, the natural next step is to integrate the accumulated data to achieve a common ontology of cell types and states. Thus, an effective and efficient cell-type identification method is urgently needed. Meanwhile, high-quality reference data remain a necessity for precise annotation. However, such tailored reference data are always lacking in practice. To address this, we aggregated multiple datasets into a meta-dataset on which annotation is conducted. Existing supervised or semi-supervised annotation methods suffer from batch effects caused by different sequencing platforms, the effect of which increases in severity with multiple reference datasets. RESULTS: Herein, a robust deep learning-based single-cell Multiple Reference Annotator (scMRA) is introduced. In scMRA, a knowledge graph is constructed to represent the characteristics of cell types in different datasets, and a graphic convolutional network serves as a discriminator based on this graph. scMRA keeps intra-cell-type closeness and the relative position of cell types across datasets. scMRA is remarkably powerful at transferring knowledge from multiple reference datasets, to the unlabeled target domain, thereby gaining an advantage over other state-of-the-art annotation methods in multi-reference data experiments. Furthermore, scMRA can remove batch effects. To the best of our knowledge, this is the first attempt to use multiple insufficient reference datasets to annotate target data, and it is, comparatively, the best annotation method for multiple scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION: An implementation of scMRA is available from https://github.com/ddb-qiwang/scMRA-torch. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Profundo , Perfilación de la Expresión Génica , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual/métodos
14.
PLoS Comput Biol ; 18(1): e1009762, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-35007289

RESUMEN

Activities of transcription factors (TFs) are temporally modulated to regulate dynamic cellular processes, including development, homeostasis, and disease. Recent developments of bioinformatic tools have enabled the analysis of TF activities using transcriptome data. However, because these methods typically use exon-based target expression levels, the estimated TF activities have limited temporal accuracy. To address this, we proposed a TF activity measure based on intron-level information in time-series RNA-seq data, and implemented it to decode the temporal control of TF activities during dynamic processes. We showed that TF activities inferred from intronic reads can better recapitulate instantaneous TF activities compared to the exon-based measure. By analyzing public and our own time-series transcriptome data, we found that intron-based TF activities improve the characterization of temporal phasing of cycling TFs during circadian rhythm, and facilitate the discovery of two temporally opposing TF modules during T cell activation. Collectively, we anticipate that the proposed approach would be broadly applicable for decoding global transcriptional architecture during dynamic processes.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Intrones/genética , Factores de Transcripción/genética , Transcriptoma/genética , Animales , Ritmo Circadiano/genética , Biología Computacional , Bases de Datos Genéticas , Humanos , Activación de Linfocitos/genética , Ratones , Linfocitos T/metabolismo , Factores de Transcripción/metabolismo
15.
Int J Mol Sci ; 24(23)2023 Nov 21.
Artículo en Inglés | MEDLINE | ID: mdl-38068885

RESUMEN

Carotenoids are important pigments in pepper fruits. The colors of each pepper are mainly determined by the composition and content of carotenoid. The 'ZY' variety, which has yellow fruit, is a natural mutant derived from a branch mutant of 'ZR' with different colors. ZY and ZR exhibit obvious differences in fruit color, but no other obvious differences in other traits. To investigate the main reasons for the formation of different colored pepper fruits, transcriptome and metabolome analyses were performed in three developmental stages (S1-S3) in two cultivars. The results revealed that these structural genes (PSY1, CRTISO, CCD1, CYP97C1, VDE1, CCS, NCED1 and NCED2) related to carotenoid biosynthesis were expressed differentially in the two cultivars. Capsanthin and capsorubin mainly accumulated in ZR and were almost non-existent in ZY. S2 is the fruit color-changing stage; this may be a critical period for the development of different color formation of ZY and ZR. A combination of transcriptome and metabolome analyses indicated that CCS, NCED2, AAO4, VDE1 and CYP97C1 genes were key to the differences in the total carotenoid content. These new insights into pepper fruit coloration may help to improve fruit breeding strategies.


Asunto(s)
Carotenoides , Fitomejoramiento , Carotenoides/metabolismo , Perfilación de la Expresión Génica , Frutas/metabolismo , Transcriptoma , Metaboloma , Regulación de la Expresión Génica de las Plantas
16.
Anal Chem ; 94(7): 3320-3327, 2022 02 22.
Artículo en Inglés | MEDLINE | ID: mdl-35147418

RESUMEN

Developing highly sensitive, reliable, cost-effective label-free DNA biosensors is challenging with traditional fluorescence, electrochemical, and other techniques. Most conventional methods require labeling fluorescence, enzymes, or other complex modification. Herein, we fabricate carbon quantum dot (CQD)-functionalized solution-gated graphene transistors for highly sensitive label-free DNA detection. The CQDs are immobilized on the surface of the gate electrode through mercaptoacetic acid with the thiol group. A single-stranded DNA (ssDNA) probe is immobilized on CQDs by strong π-π interactions. The ssDNA probe can hybridize with the ssDNA target and form double-stranded DNA, which led to a shift of Dirac voltage and the channel current response. The limit of detection can reach 1 aM which is 2-5 orders of magnitude lower than those of other methods reported previously. The sensor also exhibits a good linear range from 1 aM to 0.1 nM and has good specificity. It can effectively distinguish one-base mismatched target DNA. The response time is about 326 s for the 1 aM target DNA molecules. This work provides good perspectives on the applications in biosensors.


Asunto(s)
Técnicas Biosensibles , Grafito , Puntos Cuánticos , Técnicas Biosensibles/métodos , Carbono/química , ADN/genética , ADN de Cadena Simple , Grafito/química , Límite de Detección , Puntos Cuánticos/química
17.
Bioinformatics ; 37(6): 775-784, 2021 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-33098418

RESUMEN

MOTIVATION: The rapid development of single-cell RNA sequencing (scRNA-seq) technologies allows us to explore tissue heterogeneity at the cellular level. The identification of cell types plays an essential role in the analysis of scRNA-seq data, which, in turn, influences the discovery of regulatory genes that induce heterogeneity. As the scale of sequencing data increases, the classical method of combining clustering and differential expression analysis to annotate cells becomes more costly in terms of both labor and resources. Existing scRNA-seq supervised classification method can alleviate this issue through learning a classifier trained on the labeled reference data and then making a prediction based on the unlabeled target data. However, such label transference strategy carries with risks, such as susceptibility to batch effect and further compromise of inherent discrimination of target data. RESULTS: In this article, inspired by unsupervised domain adaptation, we propose a flexible single cell semi-supervised clustering and annotation framework, scSemiCluster, which integrates the reference data and target data for training. We utilize structure similarity regularization on the reference domain to restrict the clustering solutions of the target domain. We also incorporates pairwise constraints in the feature learning process such that cells belonging to the same cluster are close to each other, and cells belonging to different clusters are far from each other in the latent space. Notably, without explicit domain alignment and batch effect correction, scSemiCluster outperforms other state-of-the-art, single-cell supervised classification and semi-supervised clustering annotation algorithms in both simulation and real data. To the best of our knowledge, we are the first to use both deep discriminative clustering and deep generative clustering techniques in the single-cell field. AVAILABILITYAND IMPLEMENTATION: An implementation of scSemiCluster is available from https://github.com/xuebaliang/scSemiCluster. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Análisis por Conglomerados , RNA-Seq , Análisis de Secuencia de ARN
18.
J Chem Inf Model ; 62(1): 187-195, 2022 01 10.
Artículo en Inglés | MEDLINE | ID: mdl-34964625

RESUMEN

Allostery is an important mechanism that biological systems use to regulate function at a distant site. Allosteric drugs have attracted much attention in recent years due to their high specificity and the possibility of overcoming existing drug-resistant mutations. However, the discovery of allosteric drugs remains challenging as allosteric regulation mechanisms are not clearly understood and allosteric sites cannot be accurately predicted. In this study, we analyzed the dominant modes that determine motion correlations between allosteric and orthosteric sites using the Gaussian network model and found that motion correlations between allosteric and orthosteric sites are dominated by either fast or slow vibrational modes. This dependence of modes results from the relative locations of the two sites and local secondary structures. Based on these analyses, we developed CorrSite2.0 to predict allosteric sites by taking the maximum of the Z-scores calculated from using either slow or fast modes. The prediction accuracy of CorrSite2.0 outperformed other commonly used allosteric site prediction methods with prediction accuracy over 90.0%. Our study uncovers the relationship of protein structure, dynamics, and allosteric regulation and demonstrates that using the dominant motion modes greatly improves allosteric site prediction accuracy. CorrSite2.0 has been integrated into the CavityPlus web server, which can be accessed at http://www.pkumdl.cn/cavityplus. CorrSite2.0 provides a powerful and user-friendly tool for allosteric drug and protein design.


Asunto(s)
Descubrimiento de Drogas , Proteínas , Regulación Alostérica , Sitio Alostérico , Descubrimiento de Drogas/métodos , Distribución Normal , Proteínas/química
19.
Mol Breed ; 42(9): 55, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37313421

RESUMEN

Tomato spotted wilt virus (TSWV) poses a serious threat to tomato (Solanum lycopersicum) production. In this study, tomato inbred line YNAU335 was developed without the Sw-5 locus, which confers resistance or immunity to TSWV (absence of infection). Genetic analysis demonstrated that immunity to TSWV was controlled by a dominant nuclear gene. The candidate genes were mapped into a 20-kb region in the terminal of the long arm of chromosome 9 using bulk segregant analysis and linkage analysis. In this candidate region, a chalcone synthase-encoding gene (SlCHS3) was identified as a strong candidate gene for TSWV resistance. Silencing SlCHS3 reduced flavonoid synthesis, and SlCHS3 overexpression increased flavonoid content. The increase in flavonoids improved TSWV resistance in tomato. These findings indicate that SlCHS3 is indeed involved in the regulation of flavonoid synthesis and plays a significant role in TSWV resistance of YNAU335. This could provide new insights and lay the foundation for analyzing TSWV resistance mechanisms. Supplementary information: The online version contains supplementary material available at 10.1007/s11032-022-01325-5.

20.
Anal Chem ; 93(40): 13673-13679, 2021 10 12.
Artículo en Inglés | MEDLINE | ID: mdl-34597019

RESUMEN

Thrombin is an important biomarker for various diseases and biochemical reactions. Rapid and real-time detection of thrombin that quickly neutralizes in early coagulation in the body has gained significant attention for its practical applications. Solution-gated graphene transistors (SGGTs) have been widely studied due to their higher sensitivity and low-cost fabrication for chemical and biological sensing applications. In this paper, the ssDNA aptamer with 29 bases was immobilized on the surface of the gate electrode to specifically recognize thrombin. The SGGT sensor achieved high sensitivity with a limit of detection (LOD) up to fM. The LOD was attributed to the amplification function of SGGTs and the suitable aptamer choice. The ssDNA configuration folding induced by thrombin molecules and the electropositivity of thrombin molecules could arouse the same electrical response of SGGTs, helping the device obtain a high sensitivity. The channel current variation of sensors had a good linear relationship with the logarithm of thrombin concentration in the range of 1 fM to 10 nM. The fabricated device also demonstrated a short response time to thrombin molecules, and the response time to the 1 fM thrombin molecules was about 150 s. In summary, the sensing strategy of aptamer-based SGGTs with high sensitivity and high selectivity has a good prospect in medical diagnosis.


Asunto(s)
Aptámeros de Nucleótidos , Técnicas Biosensibles , Grafito , Electrodos , Límite de Detección , Oligonucleótidos , Trombina
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA