Your browser doesn't support javascript.
loading
The impact of package selection and versioning on single-cell RNA-seq analysis.
Rich, Joseph M; Moses, Lambda; Einarsson, Pétur Helgi; Jackson, Kayla; Luebbert, Laura; Booeshaghi, A Sina; Antonsson, Sindri; Sullivan, Delaney K; Bray, Nicolas; Melsted, Páll; Pachter, Lior.
Affiliation
  • Rich JM; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
  • Moses L; USC-Caltech MD/PhD Program, Keck School of Medicine, Los Angeles, CA, 90033, USA.
  • Einarsson PH; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
  • Jackson K; Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, Reykjavík, Iceland.
  • Luebbert L; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
  • Booeshaghi AS; USC-Caltech MD/PhD Program, Keck School of Medicine, Los Angeles, CA, 90033, USA.
  • Antonsson S; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
  • Sullivan DK; Department of Bioengineering, University of California Berkeley, Berkeley, CA, USA.
  • Bray N; Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, Reykjavík, Iceland.
  • Melsted P; Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
  • Pachter L; UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
bioRxiv ; 2024 Apr 11.
Article in En | MEDLINE | ID: mdl-38617255
ABSTRACT
Standard single-cell RNA-sequencing analysis (scRNA-seq) workflows consist of converting raw read data into cell-gene count matrices through sequence alignment, followed by analyses including filtering, highly variable gene selection, dimensionality reduction, clustering, and differential expression analysis. Seurat and Scanpy are the most widely-used packages implementing such workflows, and are generally thought to implement individual steps similarly. We investigate in detail the algorithms and methods underlying Seurat and Scanpy and find that there are, in fact, considerable differences in the outputs of Seurat and Scanpy. The extent of differences between the programs is approximately equivalent to the variability that would be introduced in benchmarking scRNA-seq datasets by sequencing less than 5% of the reads or analyzing less than 20% of the cell population. Additionally, distinct versions of Seurat and Scanpy can produce very different results, especially during parts of differential expression analysis. Our analysis highlights the need for users of scRNA-seq to carefully assess the tools on which they rely, and the importance of developers of scientific software to prioritize transparency, consistency, and reproducibility for their tools.
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: BioRxiv Year: 2024 Document type: Article Affiliation country: United States Country of publication: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: BioRxiv Year: 2024 Document type: Article Affiliation country: United States Country of publication: United States