Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 119
Filtrar
1.
Immunity ; 51(4): 696-708.e9, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-31618654

RESUMO

Signaling abnormalities in immune responses in the small intestine can trigger chronic type 2 inflammation involving interaction of multiple immune cell types. To systematically characterize this response, we analyzed 58,067 immune cells from the mouse small intestine by single-cell RNA sequencing (scRNA-seq) at steady state and after induction of a type 2 inflammatory reaction to ovalbumin (OVA). Computational analysis revealed broad shifts in both cell-type composition and cell programs in response to the inflammation, especially in group 2 innate lymphoid cells (ILC2s). Inflammation induced the expression of exon 5 of Calca, which encodes the alpha-calcitonin gene-related peptide (α-CGRP), in intestinal KLRG1+ ILC2s. α-CGRP antagonized KLRG1+ ILC2s proliferation but promoted IL-5 expression. Genetic perturbation of α-CGRP increased the proportion of intestinal KLRG1+ ILC2s. Our work highlights a model where α-CGRP-mediated neuronal signaling is critical for suppressing ILC2 expansion and maintaining homeostasis of the type 2 immune machinery.


Assuntos
Peptídeo Relacionado com Gene de Calcitonina/metabolismo , Inflamação/imunologia , Intestinos/imunologia , Linfócitos/imunologia , Neuropeptídeos/metabolismo , Animais , Peptídeo Relacionado com Gene de Calcitonina/genética , Células Cultivadas , Biologia Computacional , Imunidade Inata , Interleucina-5/genética , Interleucina-5/metabolismo , Lectinas Tipo C/metabolismo , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Transgênicos , Neuropeptídeos/genética , Receptores Imunológicos/metabolismo , Análise de Sequência de RNA , Transdução de Sinais , Análise de Célula Única , Células Th2/imunologia , Transcriptoma , Regulação para Cima
2.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37497716

RESUMO

Cytometry enables precise single-cell phenotyping within heterogeneous populations. These cell types are traditionally annotated via manual gating, but this method lacks reproducibility and sensitivity to batch effect. Also, the most recent cytometers-spectral flow or mass cytometers-create rich and high-dimensional data whose analysis via manual gating becomes challenging and time-consuming. To tackle these limitations, we introduce Scyan https://github.com/MICS-Lab/scyan, a Single-cell Cytometry Annotation Network that automatically annotates cell types using only prior expert knowledge about the cytometry panel. For this, it uses a normalizing flow-a type of deep generative model-that maps protein expressions into a biologically relevant latent space. We demonstrate that Scyan significantly outperforms the related state-of-the-art models on multiple public datasets while being faster and interpretable. In addition, Scyan overcomes several complementary tasks, such as batch-effect correction, debarcoding and population discovery. Overall, this model accelerates and eases cell population characterization, quantification and discovery in cytometry.


Assuntos
Biologia , Reprodutibilidade dos Testes , Citometria de Fluxo/métodos
3.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-37991248

RESUMO

Due to the high dimensionality and sparsity of the gene expression matrix in single-cell RNA-sequencing (scRNA-seq) data, coupled with significant noise generated by shallow sequencing, it poses a great challenge for cell clustering methods. While numerous computational methods have been proposed, the majority of existing approaches center on processing the target dataset itself. This approach disregards the wealth of knowledge present within other species and batches of scRNA-seq data. In light of this, our paper proposes a novel method named graph-based deep embedding clustering (GDEC) that leverages transfer learning across species and batches. GDEC integrates graph convolutional networks, effectively overcoming the challenges posed by sparse gene expression matrices. Additionally, the incorporation of DEC in GDEC enables the partitioning of cell clusters within a lower-dimensional space, thereby mitigating the adverse effects of noise on clustering outcomes. GDEC constructs a model based on existing scRNA-seq datasets and then applying transfer learning techniques to fine-tune the model using a limited amount of prior knowledge gleaned from the target dataset. This empowers GDEC to adeptly cluster scRNA-seq data cross different species and batches. Through cross-species and cross-batch clustering experiments, we conducted a comparative analysis between GDEC and conventional packages. Furthermore, we implemented GDEC on the scRNA-seq data of uterine fibroids. Compared results obtained from the Seurat package, GDEC unveiled a novel cell type (epithelial cells) and identified a notable number of new pathways among various cell types, thus underscoring the enhanced analytical capabilities of GDEC. Availability and implementation: https://github.com/YuzhiSun/GDEC/tree/main.


Assuntos
Perfilação da Expressão Gênica , Leiomioma , Humanos , Perfilação da Expressão Gênica/métodos , Algoritmos , Análise de Sequência de RNA/métodos , Análise da Expressão Gênica de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados , Aprendizado de Máquina
4.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36627114

RESUMO

Dimension reduction (DR) plays an important role in single-cell RNA sequencing (scRNA-seq), such as data interpretation, visualization and other downstream analysis. A desired DR method should be applicable to various application scenarios, including identifying cell types, preserving the inherent structure of data and handling with batch effects. However, most of the existing DR methods fail to accommodate these requirements simultaneously, especially removing batch effects. In this paper, we develop a novel structure-preserved dimension reduction (SPDR) method using intra- and inter-batch triplets sampling. The constructed triplets jointly consider each anchor's mutual nearest neighbors from inter-batch, k-nearest neighbors from intra-batch and randomly selected cells from the whole data, which capture higher order structure information and meanwhile account for batch information of the data. Then we minimize a robust loss function for the chosen triplets to obtain a structure-preserved and batch-corrected low-dimensional representation. Comprehensive evaluations show that SPDR outperforms other competing DR methods, such as INSCT, IVIS, Trimap, Scanorama, scVI and UMAP, in removing batch effects, preserving biological variation, facilitating visualization and improving clustering accuracy. Besides, the two-dimensional (2D) embedding of SPDR presents a clear and authentic expression pattern, and can guide researchers to determine how many cell types should be identified. Furthermore, SPDR is robust to complex data characteristics (such as down-sampling, duplicates and outliers) and varying hyperparameter settings. We believe that SPDR will be a valuable tool for characterizing complex cellular heterogeneity.


Assuntos
Algoritmos , Transcriptoma , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , Análise de Sequência de RNA/métodos
5.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36653900

RESUMO

Microbial communities are highly dynamic and sensitive to changes in the environment. Thus, microbiome data are highly susceptible to batch effects, defined as sources of unwanted variation that are not related to and obscure any factors of interest. Existing batch effect correction methods have been primarily developed for gene expression data. As such, they do not consider the inherent characteristics of microbiome data, including zero inflation, overdispersion and correlation between variables. We introduce new multivariate and non-parametric batch effect correction methods based on Partial Least Squares Discriminant Analysis (PLSDA). PLSDA-batch first estimates treatment and batch variation with latent components, then subtracts batch-associated components from the data. The resulting batch-effect-corrected data can then be input in any downstream statistical analysis. Two variants are proposed to handle unbalanced batch x treatment designs and to avoid overfitting when estimating the components via variable selection. We compare our approaches with popular methods managing batch effects, namely, removeBatchEffect, ComBat and Surrogate Variable Analysis, in simulated and three case studies using various visual and numerical assessments. We show that our three methods lead to competitive performance in removing batch variation while preserving treatment variation, especially for unbalanced batch $\times $ treatment designs. Our downstream analyses show selections of biologically relevant taxa. This work demonstrates that batch effect correction methods can improve microbiome research outputs. Reproducible code and vignettes are available on GitHub.


Assuntos
Microbiota , Projetos de Pesquisa , Análise dos Mínimos Quadrados , Análise Discriminante
6.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37080771

RESUMO

Single-cell RNA sequencing (scRNA-seq) has significantly accelerated the experimental characterization of distinct cell lineages and types in complex tissues and organisms. Cell-type annotation is of great importance in most of the scRNA-seq analysis pipelines. However, manual cell-type annotation heavily relies on the quality of scRNA-seq data and marker genes, and therefore can be laborious and time-consuming. Furthermore, the heterogeneity of scRNA-seq datasets poses another challenge for accurate cell-type annotation, such as the batch effect induced by different scRNA-seq protocols and samples. To overcome these limitations, here we propose a novel pipeline, termed TripletCell, for cross-species, cross-protocol and cross-sample cell-type annotation. We developed a cell embedding and dimension-reduction module for the feature extraction (FE) in TripletCell, namely TripletCell-FE, to leverage the deep metric learning-based algorithm for the relationships between the reference gene expression matrix and the query cells. Our experimental studies on 21 datasets (covering nine scRNA-seq protocols, two species and three tissues) demonstrate that TripletCell outperformed state-of-the-art approaches for cell-type annotation. More importantly, regardless of protocols or species, TripletCell can deliver outstanding and robust performance in annotating different types of cells. TripletCell is freely available at https://github.com/liuyan3056/TripletCell. We believe that TripletCell is a reliable computational tool for accurately annotating various cell types using scRNA-seq data and will be instrumental in assisting the generation of novel biological hypotheses in cell biology.


Assuntos
Algoritmos , Análise de Célula Única , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados
7.
J Infect Dis ; 2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39189314

RESUMO

As investigations of low-biomass microbial communities have become more common, so too has the recognition of major challenges affecting these analyses. These challenges have been shown to compromise biological conclusions and have contributed to several controversies. Here, we review some of the most common and influential challenges in low-biomass microbiome research. We highlight key approaches to alleviate these potential pitfalls, combining experimental planning strategies and data analysis methods.

8.
BMC Bioinformatics ; 25(1): 181, 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38720247

RESUMO

BACKGROUND: RNA sequencing combined with machine learning techniques has provided a modern approach to the molecular classification of cancer. Class predictors, reflecting the disease class, can be constructed for known tissue types using the gene expression measurements extracted from cancer patients. One challenge of current cancer predictors is that they often have suboptimal performance estimates when integrating molecular datasets generated from different labs. Often, the quality of the data is variable, procured differently, and contains unwanted noise hampering the ability of a predictive model to extract useful information. Data preprocessing methods can be applied in attempts to reduce these systematic variations and harmonize the datasets before they are used to build a machine learning model for resolving tissue of origins. RESULTS: We aimed to investigate the impact of data preprocessing steps-focusing on normalization, batch effect correction, and data scaling-through trial and comparison. Our goal was to improve the cross-study predictions of tissue of origin for common cancers on large-scale RNA-Seq datasets derived from thousands of patients and over a dozen tumor types. The results showed that the choice of data preprocessing operations affected the performance of the associated classifier models constructed for tissue of origin predictions in cancer. CONCLUSION: By using TCGA as a training set and applying data preprocessing methods, we demonstrated that batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset. On the other hand, the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO. Therefore, based on our findings with these publicly available large-scale RNA-Seq datasets, the application of data preprocessing techniques to a machine learning pipeline is not always appropriate.


Assuntos
Aprendizado de Máquina , Neoplasias , RNA-Seq , Humanos , RNA-Seq/métodos , Neoplasias/genética , Transcriptoma/genética , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos
9.
Biostatistics ; 24(3): 635-652, 2023 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-34893807

RESUMO

Nonignorable technical variation is commonly observed across data from multiple experimental runs, platforms, or studies. These so-called batch effects can lead to difficulty in merging data from multiple sources, as they can severely bias the outcome of the analysis. Many groups have developed approaches for removing batch effects from data, usually by accommodating batch variables into the analysis (one-step correction) or by preprocessing the data prior to the formal or final analysis (two-step correction). One-step correction is often desirable due it its simplicity, but its flexibility is limited and it can be difficult to include batch variables uniformly when an analysis has multiple stages. Two-step correction allows for richer models of batch mean and variance. However, prior investigation has indicated that two-step correction can lead to incorrect statistical inference in downstream analysis. Generally speaking, two-step approaches introduce a correlation structure in the corrected data, which, if ignored, may lead to either exaggerated or diminished significance in downstream applications such as differential expression analysis. Here, we provide more intuitive and more formal evaluations of the impacts of two-step batch correction compared to existing literature. We demonstrate that the undesired impacts of two-step correction (exaggerated or diminished significance) depend on both the nature of the study design and the batch effects. We also provide strategies for overcoming these negative impacts in downstream analyses using the estimated correlation matrix of the corrected data. We compare the results of our proposed workflow with the results from other published one-step and two-step methods and show that our methods lead to more consistent false discovery controls and power of detection across a variety of batch effect scenarios. Software for our method is available through GitHub (https://github.com/jtleek/sva-devel) and will be available in future versions of the $\texttt{sva}$ R package in the Bioconductor project (https://bioconductor.org/packages/release/bioc/html/sva.html).


Assuntos
Expressão Gênica , Humanos , Filogenia , Projetos de Pesquisa
10.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36088543

RESUMO

Ensemble learning is a kind of machine learning method which can integrate multiple basic learners together and achieve higher accuracy. Recently, single machine learning methods have been established to predict survival for patients with cancer. However, it still lacked a robust ensemble learning model with high accuracy to pick out patients with high risks. To achieve this, we proposed a novel genetic algorithm-aided three-stage ensemble learning method (3S score) for survival prediction. During the process of constructing the 3S score, double training sets were used to avoid over-fitting; the gene-pairing method was applied to reduce batch effect; a genetic algorithm was employed to select the best basic learner combination. When used to predict the survival state of glioma patients, this model achieved the highest C-index (0.697) as well as area under the receiver operating characteristic curve (ROC-AUCs) (first year = 0.705, third year = 0.825 and fifth year = 0.839) in the combined test set (n = 1191), compared with 12 other baseline models. Furthermore, the 3S score can distinguish survival significantly in eight cohorts among the total of nine independent test cohorts (P < 0.05), achieving significant improvement of ROC-AUCs. Notably, ablation experiments demonstrated that the gene-pairing method, double training sets and genetic algorithm make sure the robustness and effectiveness of the 3S score. The performance exploration on pan-cancer showed that the 3S score has excellent ability on survival prediction in five kinds of cancers, which was verified by Cox regression, survival curves and ROC curves together. To enable its clinical adoption, we implemented the 3S score and other two clinical factors as an easy-to-use web tool for risk scoring and therapy stratification in glioma patients.


Assuntos
Glioma , Aprendizado de Máquina , Glioma/genética , Humanos , Curva ROC , Fatores de Risco
11.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35048125

RESUMO

Normalization and batch correction are critical steps in processing single-cell RNA sequencing (scRNA-seq) data, which remove technical effects and systematic biases to unmask biological signals of interest. Although a number of computational methods have been developed, there is no guidance for choosing appropriate procedures in different scenarios. In this study, we assessed the performance of 28 scRNA-seq noise reduction procedures in 55 scenarios using simulated and real datasets. The scenarios accounted for multiple biological and technical factors that greatly affect the denoising performance, including relative magnitude of batch effects, the extent of cell population imbalance, the complexity of cell group structures, the proportion and the similarity of nonoverlapping cell populations, dropout rates and variable library sizes. We used multiple quantitative metrics and visualization of low-dimensional cell embeddings to evaluate the performance on batch mixing while preserving the original cell group and gene structures. Based on our results, we specified technical or biological factors affecting the performance of each method and recommended proper methods in different scenarios. In addition, we highlighted one challenging scenario where most methods failed and resulted in overcorrection. Our studies not only provided a comprehensive guideline for selecting suitable noise reduction procedures but also pointed out unsolved issues in the field, especially the urgent need of developing metrics for assessing batch correction on imperceptible cell-type mixing.


Assuntos
Análise de Célula Única , Software , Perfilação da Expressão Gênica/métodos , Biblioteca Gênica , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Sequenciamento do Exoma
12.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36089561

RESUMO

We present a novel self-supervised Contrastive LEArning framework for single-cell ribonucleic acid (RNA)-sequencing (CLEAR) data representation and the downstream analysis. Compared with current methods, CLEAR overcomes the heterogeneity of the experimental data with a specifically designed representation learning task and thus can handle batch effects and dropout events simultaneously. It achieves superior performance on a broad range of fundamental tasks, including clustering, visualization, dropout correction, batch effect removal, and pseudo-time inference. The proposed method successfully identifies and illustrates inflammatory-related mechanisms in a COVID-19 disease study with 43 695 single cells from peripheral blood mononuclear cells.


Assuntos
COVID-19 , RNA , COVID-19/genética , Análise por Conglomerados , Análise de Dados , Humanos , Leucócitos Mononucleares , RNA-Seq , Análise de Sequência de RNA/métodos
13.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35901449

RESUMO

Integration of single-cell transcriptome datasets from multiple sources plays an important role in investigating complex biological systems. The key to integration of transcriptome datasets is batch effect removal. Recent methods attempt to apply a contrastive learning strategy to correct batch effects. Despite their encouraging performance, the optimal contrastive learning framework for batch effect removal is still under exploration. We develop an improved contrastive learning-based batch correction framework, GLOBE. GLOBE defines adaptive translation transformations for each cell to guarantee the stability of approximating batch effects. To enhance the consistency of representations alignment, GLOBE utilizes a loss function that is both hardness-aware and consistency-aware to learn batch effect-invariant representations. Moreover, GLOBE computes batch-corrected gene matrix in a transparent approach to support diverse downstream analysis. Benchmarking results on a wide spectrum of datasets show that GLOBE outperforms other state-of-the-art methods in terms of robust batch mixing and superior conservation of biological signals. We further apply GLOBE to integrate two developing mouse neocortex datasets and show GLOBE succeeds in removing batch effects while preserving the contiguous structure of cells in raw data. Finally, a comprehensive study is conducted to validate the effectiveness of GLOBE.


Assuntos
Benchmarking , Transcriptoma , Animais , Camundongos
14.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35821114

RESUMO

Developments of single-cell RNA sequencing (scRNA-seq) technologies have enabled biological discoveries at the single-cell resolution with high throughput. However, large scRNA-seq datasets always suffer from massive technical noises, including batch effects and dropouts, and the dropout is often shown to be batch-dependent. Most existing methods only address one of the problems, and we show that the popularly used methods failed in trading off batch effect correction and dropout imputation. Here, inspired by the idea of causal inference, we propose a novel propensity score matching method for scRNA-seq data (scPSM) by borrowing information and taking the weighted average from similar cells in the deep sequenced batch, which simultaneously removes the batch effect, imputes dropout and denoises data in the entire gene expression space. The proposed method is testified on two simulation datasets and a variety of real scRNA-seq datasets, and the results show that scPSM is superior to other state-of-the-art methods. First, scPSM improves clustering accuracy and mixes cells of the same type, suggesting its ability to keep cell type separation while correcting for batch. Besides, using the scPSM-integrated data as input yields results free of batch effects or dropouts in the differential expression analysis. Moreover, scPSM not only achieves ideal denoising but also preserves real biological structure for downstream gene-based analyses. Furthermore, scPSM is robust to hyperparameters and small datasets with a few cells but enormous genes. Comprehensive evaluations demonstrate that scPSM jointly provides desirable batch effect correction, imputation and denoising for recovering the biologically meaningful expression in scRNA-seq data.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Análise por Conglomerados , Pontuação de Propensão , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software
15.
Mol Syst Biol ; 19(6): e11490, 2023 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-37063090

RESUMO

High-content image-based cell phenotyping provides fundamental insights into a broad variety of life science disciplines. Striving for accurate conclusions and meaningful impact demands high reproducibility standards, with particular relevance for high-quality open-access data sharing and meta-analysis. However, the sources and degree of biological and technical variability, and thus the reproducibility and usefulness of meta-analysis of results from live-cell microscopy, have not been systematically investigated. Here, using high-content data describing features of cell migration and morphology, we determine the sources of variability across different scales, including between laboratories, persons, experiments, technical repeats, cells, and time points. Significant technical variability occurred between laboratories and, to lesser extent, between persons, providing low value to direct meta-analysis on the data from different laboratories. However, batch effect removal markedly improved the possibility to combine image-based datasets of perturbation experiments. Thus, reproducible quantitative high-content cell image analysis of perturbation effects and meta-analysis depend on standardized procedures combined with batch correction.


Assuntos
Reprodutibilidade dos Testes , Movimento Celular
16.
Methods ; 220: 61-68, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37931852

RESUMO

Spatial transcriptomics is a rapidly evolving field that enables researchers to capture comprehensive molecular profiles while preserving information about the physical locations. One major challenge in this research area involves the identification of spatial domains, which are distinct regions characterized by unique gene expression patterns. However, current unsupervised methods have struggled to perform well in this regard due to the presence of high levels of noise and dropout events in spatial transcriptomic profiles. In this paper, we propose a novel hexagonal Convolutional Neural Network (hexCNN) for hexagonal image segmentation on spatially resolved transcriptomics. To address the problem of noise and dropout occurrences within spatial transcriptomics data, we first extend an unsupervised algorithm to a supervised learning method that can identify useful features and reduce noise hindrance. Then, inspired by the classical convolution in convolutional neural networks (CNNs), we designed a regular hexagonal convolution to compensate for the missing gene expression patterns from adjacent spots. We evaluated the performance of hexCNN by applying it to the DLPFC dataset. The results show that hexCNN achieves a classification accuracy of 86.8% and an average Rand index (ARI) of 77.1% (1.4% and 2.5% higher than those of GNNs). The results also demonstrate that hexCNN is capable of removing the noise caused by batch effect while preserving the biological signal differences.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Redes Neurais de Computação , Processamento de Imagem Assistida por Computador
17.
BMC Bioinformatics ; 24(1): 86, 2023 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-36882691

RESUMO

BACKGROUND: We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared to randomization and stratified randomization in a case-control study (30 per group) with a covariate (case vs control, represented as ß1, set to be null) and two biologically relevant confounding variables (age, represented as ß2, and hemoglobin A1c (HbA1c), represented as ß3). Gene expression values were obtained from a publicly available dataset of expression data obtained from pancreas islet cells. Batch effects were simulated as twice the median biological variation across the gene expression dataset and were added to the publicly available dataset to simulate a batch effect condition. Bias was calculated as the absolute difference between observed betas under the batch allocation strategies and the true beta (no batch effects). Bias was also evaluated after adjustment for batch effects using ComBat as well as a linear regression model. In order to understand performance of our optimal allocation strategy under the alternative hypothesis, we also evaluated bias at a single gene associated with both age and HbA1c levels in the 'true' dataset (CAPN13 gene). RESULTS: Pre-batch correction, under the null hypothesis (ß1), maximum absolute bias and root mean square (RMS) of maximum absolute bias, were minimized using the optimal allocation strategy. Under the alternative hypothesis (ß2 and ß3 for the CAPN13 gene), maximum absolute bias and RMS of maximum absolute bias were also consistently lower using the optimal allocation strategy. ComBat and the regression batch adjustment methods performed well as the bias estimates moved towards the true values in all conditions under both the null and alternative hypotheses. Although the differences between methods were less pronounced following batch correction, estimates of bias (average and RMS) were consistently lower using the optimal allocation strategy under both the null and alternative hypotheses. CONCLUSIONS: Our algorithm provides an extremely flexible and effective method for assigning samples to batches by exploiting knowledge of covariates prior to sample allocation.


Assuntos
Algoritmos , Nível de Saúde , Pontuação de Propensão , Estudos de Casos e Controles , Hemoglobinas Glicadas , Humanos
18.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33073843

RESUMO

Single-cell mRNA sequencing has been adopted as a powerful technique for understanding gene expression profiles at the single-cell level. However, challenges remain due to factors such as the inefficiency of mRNA molecular capture, technical noises and separate sequencing of cells in different batches. Normalization methods have been developed to ensure a relatively accurate analysis. This work presents a survey on 10 tools specifically designed for single-cell mRNA sequencing data preprocessing steps, among which 6 tools are used for dropout normalization and 4 tools are for batch effect correction. In this survey, we outline the main methodology for each of these tools, and we also compare these tools to evaluate their normalization performance on datasets which are simulated under the constraints of dropout inefficiency, batch effect or their combined effects. We found that Saver and Baynorm performed better than other methods in dropout normalization, in most cases. Beer and Batchelor performed better in the batch effect normalization, and the Saver-Beer tool combination and the Baynorm-Beer combination performed better in the mixed dropout-and-batch effect normalization. Over-normalization is a common issue occurred to these dropout normalization tools that is worth of future investigation. For the batch normalization tools, the capability of retaining heterogeneity between different groups of cells after normalization can be another direction for future improvement.


Assuntos
Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , RNA Mensageiro , Análise de Célula Única , Software , Transcriptoma , RNA Mensageiro/biossíntese , RNA Mensageiro/genética
19.
Brief Bioinform ; 22(3)2021 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-32591778

RESUMO

Batch effect correction has been recognized to be indispensable when integrating single-cell RNA sequencing (scRNA-seq) data from multiple batches. State-of-the-art methods ignore single-cell cluster label information, but such information can improve the effectiveness of batch effect correction, particularly under realistic scenarios where biological differences are not orthogonal to batch effects. To address this issue, we propose SMNN for batch effect correction of scRNA-seq data via supervised mutual nearest neighbor detection. Our extensive evaluations in simulated and real datasets show that SMNN provides improved merging within the corresponding cell types across batches, leading to reduced differentiation across batches over MNN, Seurat v3 and LIGER. Furthermore, SMNN retains more cell-type-specific features, partially manifested by differentially expressed genes identified between cell types after SMNN correction being biologically more relevant, with precision improving by up to 841.0%.


Assuntos
Algoritmos , Bases de Dados de Ácidos Nucleicos , RNA-Seq , Análise de Célula Única , Análise por Conglomerados , Humanos
20.
Brief Bioinform ; 22(1): 416-427, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-31925417

RESUMO

Recent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch-effect confounding. SCDC is benchmarked against existing methods using both in silico generated pseudo-bulk samples and experimentally mixed cell lines, whose known cell-type compositions serve as ground truths. We show that SCDC outperforms existing methods with improved accuracy of cell-type decomposition under both settings. To illustrate how the ENSEMBLE framework performs in complex tissues under different scenarios, we further apply our method to a human pancreatic islet dataset and a mouse mammary gland dataset. SCDC returns results that are more consistent with experimental designs and that reproduce more significant associations between cell-type proportions and measured phenotypes.


Assuntos
RNA-Seq/métodos , Análise de Célula Única/métodos , Software/normas , Animais , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Ilhotas Pancreáticas/metabolismo , Células MCF-7 , Glândulas Mamárias Animais/metabolismo , Camundongos , RNA-Seq/normas , Padrões de Referência , Análise de Célula Única/normas
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa