Your browser doesn't support javascript.
loading
The effect of data transformation on low-dimensional integration of single-cell RNA-seq.
Park, Youngjun; Hauschild, Anne-Christin.
Afiliación
  • Park Y; Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany.
  • Hauschild AC; International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen, Göttingen, Germany.
BMC Bioinformatics ; 25(1): 171, 2024 Apr 30.
Article en En | MEDLINE | ID: mdl-38689234
ABSTRACT

BACKGROUND:

Recent developments in single-cell RNA sequencing have opened up a multitude of possibilities to study tissues at the level of cellular populations. However, the heterogeneity in single-cell sequencing data necessitates appropriate procedures to adjust for technological limitations and various sources of noise when integrating datasets from different studies. While many analysis procedures employ various preprocessing steps, they often overlook the importance of selecting and optimizing the employed data transformation methods.

RESULTS:

This work investigates data transformation approaches used in single-cell clustering analysis tools and their effects on batch integration analysis. In particular, we compare 16 transformations and their impact on the low-dimensional representations, aiming to reduce the batch effect and integrate multiple single-cell sequencing data. Our results show that data transformations strongly influence the results of single-cell clustering on low-dimensional data space, such as those generated by UMAP or PCA. Moreover, these changes in low-dimensional space significantly affect trajectory analysis using multiple datasets, as well. However, the performance of the data transformations greatly varies across datasets, and the optimal method was different for each dataset. Additionally, we explored how data transformation impacts the analysis of deep feature encodings using deep neural network-based models, including autoencoder-based models and proto-typical networks. Data transformation also strongly affects the outcome of deep neural network models.

CONCLUSIONS:

Our findings suggest that the batch effect and noise in integrative analysis are highly influenced by data transformation. Low-dimensional features can integrate different batches well when proper data transformation is applied. Furthermore, we found that the batch mixing score on low-dimensional space can guide the selection of the optimal data transformation. In conclusion, data preprocessing is one of the most crucial analysis steps and needs to be cautiously considered in the integrative analysis of multiple scRNA-seq datasets.
Asunto(s)
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: RNA-Seq / Análisis de Expresión Génica de una Sola Célula Límite: Humans Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Alemania

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: RNA-Seq / Análisis de Expresión Génica de una Sola Célula Límite: Humans Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2024 Tipo del documento: Article País de afiliación: Alemania