Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 160
Filtrar
1.
Nat Immunol ; 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38942990

RESUMO

The immunological mechanisms underlying chronic colitis are poorly understood. T follicular helper (TFH) cells are critical in helping B cells during germinal center reactions. In a T cell transfer colitis model, a lymphoid structure composed of mature dendritic cells (DCs) and TFH cells was found within T cell zones of colonic lymphoid follicles. TFH cells were required for mature DC accumulation, the formation of DC-T cell clusters and colitis development. Moreover, DCs promoted TFH cell differentiation, contributing to colitis development. A lineage-tracing analysis showed that, following migration to the lamina propria, TFH cells transdifferentiated into long-lived pathogenic TH1 cells, promoting colitis development. Our findings have therefore demonstrated the reciprocal regulation of TFH cells and DCs in colonic lymphoid follicles, which is critical in chronic colitis pathogenesis.

2.
Nat Methods ; 2024 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38844628

RESUMO

Large pretrained models have become foundation models leading to breakthroughs in natural language processing and related fields. Developing foundation models for deciphering the 'languages' of cells and facilitating biomedical research is promising yet challenging. Here we developed a large pretrained model scFoundation, also named 'xTrimoscFoundationα', with 100 million parameters covering about 20,000 genes, pretrained on over 50 million human single-cell transcriptomic profiles. scFoundation is a large-scale model in terms of the size of trainable parameters, dimensionality of genes and volume of training data. Its asymmetric transformer-like architecture and pretraining task design empower effectively capturing complex context relations among genes in a variety of cell types and states. Experiments showed its merit as a foundation model that achieved state-of-the-art performances in a diverse array of single-cell analysis tasks such as gene expression enhancement, tissue drug response prediction, single-cell drug response classification, single-cell perturbation prediction, cell type annotation and gene module inference.

3.
Genome Res ; 33(10): 1788-1805, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37827697

RESUMO

Cell-cell communication (CCC) is critical for determining cell fates and functions in multicellular organisms. With the advent of single-cell RNA-sequencing (scRNA-seq) and spatial transcriptomics (ST), an increasing number of CCC inference methods have been developed. Nevertheless, a thorough comparison of their performances is yet to be conducted. To fill this gap, we developed a systematic benchmark framework called ESICCC to evaluate 18 ligand-receptor (LR) inference methods and five ligand/receptor-target inference methods using a total of 116 data sets, including 15 ST data sets, 15 sets of cell line perturbation data, two sets of cell type-specific expression/proteomics data, and 84 sets of sampled or unsampled scRNA-seq data. We evaluated and compared the agreement, accuracy, robustness, and usability of these methods. Regarding accuracy evaluation, RNAMagnet, CellChat, and scSeqComm emerge as the three best-performing methods for intercellular ligand-receptor inference based on scRNA-seq data, whereas stMLnet and HoloNet are the best methods for predicting ligand/receptor-target regulation using ST data. To facilitate the practical applications, we provide a decision-tree-style guideline for users to easily choose best tools for their specific research concerns in CCC inference, and develop an ensemble pipeline CCCbank that enables versatile combinations of methods and databases. Moreover, our comparative results also uncover several critical influential factors for CCC inference, such as prior interaction information, ligand-receptor scoring algorithm, intracellular signaling complexity, and spatial relationship, which may be considered in the future studies to advance the development of new methodologies.


Assuntos
Análise de Célula Única , Software , Ligantes , Análise de Célula Única/métodos , Algoritmos , Comunicação Celular/genética , Análise de Sequência de RNA/métodos
4.
Genome Res ; 33(10): 1757-1773, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37903634

RESUMO

Rapid advances in spatial transcriptomics (ST) have revolutionized the interrogation of spatial heterogeneity and increase the demand for comprehensive methods to effectively characterize spatial domains. As a prerequisite for ST data analysis, spatial domain characterization is a crucial step for downstream analyses and biological implications. Here we propose a prior-based self-attention framework for spatial transcriptomics (PAST), a variational graph convolutional autoencoder for ST, which effectively integrates prior information via a Bayesian neural network, captures spatial patterns via a self-attention mechanism, and enables scalable application via a ripple walk sampler strategy. Through comprehensive experiments on data sets generated by different technologies, we show that PAST can effectively characterize spatial domains and facilitate various downstream analyses, including ST visualization, spatial trajectory inference and pseudotime analysis. Also, we highlight the advantages of PAST for multislice joint embedding and automatic annotation of spatial domains in newly sequenced ST data. Compared with existing methods, PAST is the first ST method that integrates reference data to analyze ST data. We anticipate that PAST will open up new avenues for researchers to decipher ST data with customized reference data, which expands the applicability of ST technology.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Teorema de Bayes , Redes Neurais de Computação , Análise Espacial
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38493343

RESUMO

Recent advancements in single-cell sequencing technologies have generated extensive omics data in various modalities and revolutionized cell research, especially in the single-cell RNA and ATAC data. The joint analysis across scRNA-seq data and scATAC-seq data has paved the way to comprehending the cellular heterogeneity and complex cellular regulatory networks. Multi-omics integration is gaining attention as an important step in joint analysis, and the number of computational tools in this field is growing rapidly. In this paper, we benchmarked 12 multi-omics integration methods on three integration tasks via qualitative visualization and quantitative metrics, considering six main aspects that matter in multi-omics data analysis. Overall, we found that different methods have their own advantages on different aspects, while some methods outperformed other methods in most aspects. We therefore provided guidelines for selecting appropriate methods for specific scenarios and tasks to help obtain meaningful insights from multi-omics data integration.


Assuntos
Benchmarking , Multiômica , Algoritmos , Ciclo Celular , RNA
6.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38324621

RESUMO

Single-cell clustered regularly interspaced short palindromic repeats-sequencing (scCRISPR-seq) is an emerging high-throughput CRISPR screening technology where the true cellular response to perturbation is coupled with infected proportion bias of guide RNAs (gRNAs) across different cell clusters. The mixing of these effects introduces noise into scCRISPR-seq data analysis and thus obstacles to relevant studies. We developed scDecouple to decouple true cellular response of perturbation from the influence of infected proportion bias. scDecouple first models the distribution of gene expression profiles in perturbed cells and then iteratively finds the maximum likelihood of cell cluster proportions as well as the cellular response for each gRNA. We demonstrated its performance in a series of simulation experiments. By applying scDecouple to real scCRISPR-seq data, we found that scDecouple enhances the identification of biologically perturbation-related genes. scDecouple can benefit scCRISPR-seq data analysis, especially in the case of heterogeneous samples or complex gRNA libraries.


Assuntos
Ensaios de Triagem em Larga Escala , RNA Guia de Sistemas CRISPR-Cas
7.
Proc Natl Acad Sci U S A ; 120(15): e2216698120, 2023 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-37023129

RESUMO

Discovering DNA regulatory sequence motifs and their relative positions is vital to understanding the mechanisms of gene expression regulation. Although deep convolutional neural networks (CNNs) have achieved great success in predicting cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons which respond to multiple types of sequence patterns. Since existing interpretation methods were mainly designed to visualize the class of sequences that can activate the neuron, the resulting visualization will correspond to a mixture of patterns. Such a mixture is usually difficult to interpret without resolving the mixed patterns. We propose the NeuronMotif algorithm to interpret such neurons. Given any convolutional neuron (CN) in the network, NeuronMotif first generates a large sample of sequences capable of activating the CN, which typically consists of a mixture of patterns. Then, the sequences are "demixed" in a layer-wise manner by backward clustering of the feature maps of the involved convolutional layers. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures. Compared to existing methods, the motifs found by NeuronMotif have more matches to known motifs in the JASPAR database. The higher-order patterns uncovered for deep CNs are supported by the literature and ATAC-seq footprinting. Overall, NeuronMotif enables the deciphering of cis-regulatory codes from deep CNs and enhances the utility of CNN in genome interpretation.


Assuntos
Algoritmos , Redes Neurais de Computação , Motivos de Nucleotídeos/genética , Sequências Reguladoras de Ácido Nucleico/genética , Bases de Dados Factuais
8.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37824741

RESUMO

Cell-cell communication events (CEs) are mediated by multiple ligand-receptor (LR) pairs. Usually only a particular subset of CEs directly works for a specific downstream response in a particular microenvironment. We name them as functional communication events (FCEs) of the target responses. Decoding FCE-target gene relations is: important for understanding the mechanisms of many biological processes, but has been intractable due to the mixing of multiple factors and the lack of direct observations. We developed a method HoloNet for decoding FCEs using spatial transcriptomic data by integrating LR pairs, cell-type spatial distribution and downstream gene expression into a deep learning model. We modeled CEs as a multi-view network, developed an attention-based graph learning method to train the model for generating target gene expression with the CE networks, and decoded the FCEs for specific downstream genes by interpreting trained models. We applied HoloNet on three Visium datasets of breast cancer and liver cancer. The results detangled the multiple factors of FCEs by revealing how LR signals and cell types affect specific biological processes, and specified FCE-induced effects in each single cell. We conducted simulation experiments and showed that HoloNet is more reliable on LR prioritization in comparison with existing methods. HoloNet is a powerful tool to illustrate cell-cell communication landscapes and reveal vital FCEs that shape cellular phenotypes. HoloNet is available as a Python package at https://github.com/lhc17/HoloNet.


Assuntos
Neoplasias Hepáticas , Transcriptoma , Humanos , Perfilação da Expressão Gênica , Comunicação Celular/genética , Simulação por Computador , Microambiente Tumoral
9.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34472588

RESUMO

Quantifying cell proportions, especially for rare cell types in some scenarios, is of great value in tracking signals associated with certain phenotypes or diseases. Although some methods have been proposed to infer cell proportions from multicomponent bulk data, they are substantially less effective for estimating the proportions of rare cell types which are highly sensitive to feature outliers and collinearity. Here we proposed a new deconvolution algorithm named ARIC to estimate cell type proportions from gene expression or DNA methylation data. ARIC employs a novel two-step marker selection strategy, including collinear feature elimination based on the component-wise condition number and adaptive removal of outlier markers. This strategy can systematically obtain effective markers for weighted $\upsilon$-support vector regression to ensure a robust and precise rare proportion prediction. We showed that ARIC can accurately estimate fractions in both DNA methylation and gene expression data from different experiments. We further applied ARIC to the survival prediction of ovarian cancer and the condition monitoring of chronic kidney disease, and the results demonstrate the high accuracy and robustness as well as clinical potentials of ARIC. Taken together, ARIC is a promising tool to solve the deconvolution problem of bulk data where rare components are of vital importance.


Assuntos
Algoritmos , Metilação de DNA , Biomarcadores , Expressão Gênica
10.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37494428

RESUMO

MOTIVATION: Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation. RESULTS: We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data. AVAILABILITY AND IMPLEMENTATION: simCAS is freely available at https://github.com/Chen-Li-17/simCAS.


Assuntos
Cromatina , Regulação da Expressão Gênica , Simulação por Computador , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Célula Única/métodos
11.
Bioinformatics ; 38(11): 2996-3003, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35394015

RESUMO

MOTIVATION: Single-cell technologies play a crucial role in revolutionizing biological research over the past decade, which strengthens our understanding in cell differentiation, development and regulation from a single-cell level perspective. Single-cell RNA sequencing (scRNA-seq) is one of the most common single cell technologies, which enables probing transcriptional states in thousands of cells in one experiment. Identification of cell types from scRNA-seq measurements is a fundamental and crucial question to answer. Most previous studies directly take gene expression as input while ignoring the comprehensive gene-gene interactions. RESULTS: We propose scGraph, an automatic cell identification algorithm leveraging gene interaction relationships to enhance the performance of the cell-type identification. scGraph is based on a graph neural network to aggregate the information of interacting genes. In a series of experiments, we demonstrate that scGraph is accurate and outperforms eight comparison methods in the task of cell-type identification. Moreover, scGraph automatically learns the gene interaction relationships from biological data and the pathway enrichment analysis shows consistent findings with previous analysis, providing insights on the analysis of regulatory mechanism. AVAILABILITY AND IMPLEMENTATION: scGraph is freely available at https://github.com/QijinYin/scGraph and https://figshare.com/articles/software/scGraph/17157743. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Software , Redes Neurais de Computação
12.
Eur Radiol ; 33(2): 893-903, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36001124

RESUMO

OBJECTIVES: To quantify intra-tumor heterogeneity (ITH) in non-small cell lung cancer (NSCLC) from computed tomography (CT) images. METHODS: We developed a quantitative ITH measurement-ITHscore-by integrating local radiomic features and global pixel distribution patterns. The associations of ITHscore with tumor phenotypes, genotypes, and patient's prognosis were examined on six patient cohorts (n = 1399) to validate its effectiveness in characterizing ITH. RESULTS: For stage I NSCLC, ITHscore was consistent with tumor progression from stage IA1 to IA3 (p < 0.001) and captured key pathological change in terms of malignancy (p < 0.001). ITHscore distinguished the presence of lymphovascular invasion (p = 0.003) and pleural invasion (p = 0.001) in tumors. ITHscore also separated patient groups with different overall survival (p = 0.004) and disease-free survival conditions (p = 0.005). Radiogenomic analysis showed that the level of ITHscore in stage I and stage II NSCLC is correlated with heterogeneity-related pathways. In addition, ITHscore was proved to be a stable measurement and can be applied to ITH quantification in head-and-neck cancer (HNC). CONCLUSIONS: ITH in NSCLC can be quantified from CT images by ITHscore, which is an indicator for tumor phenotypes and patient's prognosis. KEY POINTS: • ITHscore provides a radiomic quantification of intra-tumor heterogeneity in NSCLC. • ITHscore is an indicator for tumor phenotypes and patient's prognosis. • ITHscore has the potential to be generalized to other cancer types such as HNC.


Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias de Cabeça e Pescoço , Neoplasias Pulmonares , Humanos , Carcinoma Pulmonar de Células não Pequenas/patologia , Neoplasias Pulmonares/patologia , Prognóstico , Tomografia Computadorizada por Raios X/métodos
13.
Nucleic Acids Res ; 49(W1): W483-W490, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-33999180

RESUMO

Chromatin accessibility, as a powerful marker of active DNA regulatory elements, provides valuable information for understanding regulatory mechanisms. The revolution in high-throughput methods has accumulated massive chromatin accessibility profiles in public repositories. Nevertheless, utilization of these data is hampered by cumbersome collection, time-consuming processing, and manual chromatin accessibility (openness) annotation of genomic regions. To fill this gap, we developed OpenAnnotate (http://health.tsinghua.edu.cn/openannotate/) as the first web server for efficiently annotating openness of massive genomic regions across various biosample types, tissues, and biological systems. In addition to the annotation resource from 2729 comprehensive profiles of 614 biosample types of human and mouse, OpenAnnotate provides user-friendly functionalities, ultra-efficient calculation, real-time browsing, intuitive visualization, and elaborate application notebooks. We show its unique advantages compared to existing databases and toolkits by effectively revealing cell type-specificity, identifying regulatory elements and 3D chromatin contacts, deciphering gene functional relationships, inferring functions of transcription factors, and unprecedentedly promoting single-cell data analyses. We anticipate OpenAnnotate will provide a promising avenue for researchers to construct a more holistic perspective to understand regulatory mechanisms.


Assuntos
Cromatina/metabolismo , Genômica/métodos , Anotação de Sequência Molecular/métodos , Software , Internet , Sequências Reguladoras de Ácido Nucleico , Análise de Célula Única , Fatores de Transcrição/metabolismo
14.
BMC Anesthesiol ; 23(1): 160, 2023 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-37161402

RESUMO

OBJECTIVE: To examine the prognostic value of HRV measurements during anesthesia for postoperative clinical outcomes prediction using machine learning models. DATA SOURCES: VitalDB, a comprehensive database of 6388 surgical patients admitted to Seoul National University Hospital. ELIGIBILITY CRITERIA FOR STUDY SELECTION: Cases with ECG lead II recording duration of less than one hour were excluded. Cases with more than 20% of missing HRV measurements were also excluded. A total of 5641 cases were eligible for the analyses. METHODS: Six machine learning models including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Decision Trees (GBT), Extreme Gradient Boosting (XGB), and an ensemble of the five baseline models were developed to predict postoperative clinical outcomes. The prediction models were trained using only clinical information, and using both clinical information and HRV features, respectively. Feature importance based on the SHAP method was used to assess the contribution of the HRV measurements to the outcome predictions. Subgroup analysis was also performed to evaluate the risk association between postoperative ICU stay and various HRV measurements such as heart rate, low-frequency power (LFP), and short-term fluctuation DFA [Formula: see text]. RESULT: The final cohort included 5641 unique cases, among whom 4678 (83.0%) cases had ages over 40, 2877 (51.0%) were male, 1073 (19.0%) stayed in ICU after surgery, 52 (0.9%) suffered in-hospital death, and 3167(56.1%) had a total length of hospital stay longer than 7 days. In the final test set, the highest AUROC performance with only clinical information was 0.79 for postoperative ICU stay, 0.58 for in-hospital mortality, and 0.76 for the total length of hospital stay prediction. Importantly, using both clinical information and HRV features, the AUROC performance was 0.83, 0.70, and 0.76 for the three clinical outcome predictions, respectively. Subgroup analysis found that patients with an average heart rate higher than 70, low-frequency power (LFP) < 33, and short-term fluctuation DFA [Formula: see text] < 0.95 during anesthesia, had a significantly higher risk of entering the ICU after surgery. CONCLUSION: This study suggested that HRV measurements during anesthesia are feasible and effective for predicting postoperative clinical outcomes.


Assuntos
Anestesia , Anestesiologia , Humanos , Frequência Cardíaca , Mortalidade Hospitalar , Prognóstico
15.
BMC Bioinformatics ; 23(Suppl 4): 129, 2022 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-35428192

RESUMO

BACKGROUND: Drug resistance is a critical obstacle in cancer therapy. Discovering cancer drug response is important to improve anti-cancer drug treatment and guide anti-cancer drug design. Abundant genomic and drug response resources of cancer cell lines provide unprecedented opportunities for such study. However, cancer cell lines cannot fully reflect heterogeneous tumor microenvironments. Transferring knowledge studied from in vitro cell lines to single-cell and clinical data will be a promising direction to better understand drug resistance. Most current studies include single nucleotide variants (SNV) as features and focus on improving predictive ability of cancer drug response on cell lines. However, obtaining accurate SNVs from clinical tumor samples and single-cell data is not reliable. This makes it difficult to generalize such SNV-based models to clinical tumor data or single-cell level studies in the future. RESULTS: We present a new method, DualGCN, a unified Dual Graph Convolutional Network model to predict cancer drug response. DualGCN encodes both chemical structures of drugs and omics data of biological samples using graph convolutional networks. Then the two embeddings are fed into a multilayer perceptron to predict drug response. DualGCN incorporates prior knowledge on cancer-related genes and protein-protein interactions, and outperforms most state-of-the-art methods while avoiding using large-scale SNV data. CONCLUSIONS: The proposed method outperforms most state-of-the-art methods in predicting cancer drug response without the use of large-scale SNV data. These favorable results indicate its potential to be extended to clinical and single-cell tumor samples and advancements in precision medicine.


Assuntos
Antineoplásicos , Neoplasias , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Genômica , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/genética , Redes Neurais de Computação , Microambiente Tumoral
16.
Bioinformatics ; 37(21): 3964-3965, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34096998

RESUMO

SUMMARY: Clustering is a key step in revealing heterogeneities in single-cell data. Most existing single-cell clustering methods output a fixed number of clusters without the hierarchical information. Classical hierarchical clustering (HC) provides dendrograms of cells, but cannot scale to large datasets due to high computational complexity. We present HGC, a fast Hierarchical Graph-based Clustering tool to address both problems. It combines the advantages of graph-based clustering and HC. On the shared nearest-neighbor graph of cells, HGC constructs the hierarchical tree with linear time complexity. Experiments showed that HGC enables multiresolution exploration of the biological hierarchy underlying the data, achieves state-of-the-art accuracy on benchmark data and can scale to large datasets. AVAILABILITY AND IMPLEMENTATION: The R package of HGC is available at https://bioconductor.org/packages/HGC/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Análise por Conglomerados , Benchmarking , Heterogeneidade Genética
17.
Bioinformatics ; 37(23): 4392-4398, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34165490

RESUMO

MOTIVATION: Recent developments of spatial transcriptomic sequencing technologies provide powerful tools for understanding cells in the physical context of tissue microenvironments. A fundamental task in spatial gene expression analysis is to identify genes with spatially variable expression patterns, or spatially variable genes (SVgenes). Several computational methods have been developed for this task. Their high computational complexity limited their scalability to the latest and future large-scale spatial expression data. RESULTS: We present SOMDE, an efficient method for identifying SVgenes in large-scale spatial expression data. SOMDE uses self-organizing map to cluster neighboring cells into nodes, and then uses a Gaussian process to fit the node-level spatial gene expression to identify SVgenes. Experiments show that SOMDE is about 5-50 times faster than existing methods with comparable results. The adjustable resolution of SOMDE makes it the only method that can give results in ∼5 min in large datasets of more than 20 000 sequencing sites. SOMDE is available as a python package on PyPI at https://pypi.org/project/somde free for academic use. AVAILABILITY AND IMPLEMENTATION: SOMDE is available for download from PyPI, and the source code is openly available from the Github repository https://github.com/XuegongLab/somde. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Software , Distribuição Normal
18.
Bioinformatics ; 37(2): 285-287, 2021 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-33416830

RESUMO

SUMMARY: Recent advances of long-term time-lapse microscopy have made it easy for researchers to quantify cell behavior and molecular dynamics at single-cell resolution. However, the lack of easy-to-use software tools optimized for customized research is still a major challenge for quantitatively understanding biological processes through microscopy images. Here, we present CellTracker, a highly integrated graphical user interface software, for automated cell segmentation and tracking of time-lapse microscopy images. It covers essential steps in image analysis including project management, image pre-processing, cell segmentation, cell tracking, manually correction and statistical analysis such as the quantification of cell size and fluorescence intensity, etc. Furthermore, CellTracker provides an annotation tool and supports model training from scratch, thus proposing a flexible and scalable solution for customized dataset analysis. AVAILABILITY AND IMPLEMENTATION: CellTracker is an open-source software under the GPL-3.0 license. It is implemented in Python and provides an easy-to-use graphical user interface. The source code, instruction manual and demos can be found at https://github.com/WangLabTHU/CellTracker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

19.
BMC Genomics ; 22(1): 60, 2021 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-33468056

RESUMO

BACKGROUND: Efficient regulation of bacterial genes in response to the environmental stimulus results in unique gene clusters known as operons. Lack of complete operonic reference and functional information makes the prediction of metagenomic operons a challenging task; thus, opening new perspectives on the interpretation of the host-microbe interactions. RESULTS: In this work, we identified whole-genome and metagenomic operons via MetaRon (Metagenome and whole-genome opeRon prediction pipeline). MetaRon identifies operons without any experimental or functional information. MetaRon was implemented on datasets with different levels of complexity and information. Starting from its application on whole-genome to simulated mixture of three whole-genomes (E. coli MG1655, Mycobacterium tuberculosis H37Rv and Bacillus subtilis str. 16), E. coli c20 draft genome extracted from chicken gut and finally on 145 whole-metagenome data samples from human gut. MetaRon consistently achieved high operon prediction sensitivity, specificity and accuracy across E. coli whole-genome (97.8, 94.1 and 92.4%), simulated genome (93.7, 75.5 and 88.1%) and E. coli c20 (87, 91 and 88%,), respectively. Finally, we identified 1,232,407 unique operons from 145 paired-end human gut metagenome samples. We also report strong association of type 2 diabetes with Maltose phosphorylase (K00691), 3-deoxy-D-glycero-D-galacto-nononate 9-phosphate synthase (K21279) and an uncharacterized protein (K07101). CONCLUSION: With MetaRon, we were able to remove two notable limitations of existing whole-genome operon prediction methods: (1) generalizability (ability to predict operons in unrelated bacterial genomes), and (2) whole-genome and metagenomic data management. We also demonstrate the use of operons as a subset to represent the trends of secondary metabolites in whole-metagenome data and the role of secondary metabolites in the occurrence of disease condition. Using operonic data from metagenome to study secondary metabolic trends will significantly reduce the data volume to more precise data. Furthermore, the identification of metabolic pathways associated with the occurrence of type 2 diabetes (T2D) also presents another dimension of analyzing the human gut metagenome. Presumably, this study is the first organized effort to predict metagenomic operons and perform a detailed analysis in association with a disease, in this case type 2 diabetes. The application of MetaRon to metagenomic data at diverse scale will be beneficial to understand the gene regulation and therapeutic metagenomics.


Assuntos
Diabetes Mellitus Tipo 2 , Metagenômica , Escherichia coli/genética , Humanos , Metagenoma , Óperon/genética
20.
Genomics ; 112(3): 2418-2425, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-31981701

RESUMO

Alternative splicing contributes to the diversity of gene products by producing multiple transcript variants from one gene. Previous studies have revealed highly variable splicing patterns in single cells, but there is still a controversy in the understanding of the simultaneous expression of multiple transcript variants. Here we show that the dominance of a single transcript variant is a common phenomenon in single cells. We analyzed several single-cell RNA sequencing datasets and observed consistent results. Our results demonstrate that single cells tend to express one major transcript variant of a gene, and the diversity of transcript variants in cell populations mainly results from the heterogeneity of splicing pattern in single cells.


Assuntos
Processamento Alternativo , Isoformas de RNA/metabolismo , Animais , Expressão Gênica , Perfilação da Expressão Gênica , Células-Tronco Hematopoéticas/metabolismo , Humanos , Camundongos , Análise de Sequência de RNA , Análise de Célula Única
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA