Your browser doesn't support javascript.
loading
Inferring cellular and molecular processes in single-cell data with non-negative matrix factorization using Python, R and GenePattern Notebook implementations of CoGAPS.
Johnson, Jeanette A I; Tsang, Ashley P; Mitchell, Jacob T; Zhou, David L; Bowden, Julia; Davis-Marcisak, Emily; Sherman, Thomas; Liefeld, Ted; Loth, Melanie; Goff, Loyal A; Zimmerman, Jacquelyn W; Kinny-Köster, Ben; Jaffee, Elizabeth M; Tamayo, Pablo; Mesirov, Jill P; Reich, Michael; Fertig, Elana J; Stein-O'Brien, Genevieve L.
Affiliation
  • Johnson JAI; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
  • Tsang AP; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
  • Mitchell JT; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
  • Zhou DL; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
  • Bowden J; Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
  • Davis-Marcisak E; Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA.
  • Sherman T; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
  • Liefeld T; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
  • Loth M; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
  • Goff LA; Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
  • Zimmerman JW; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
  • Kinny-Köster B; Department of Medicine, Moores Cancer Center, University of California San Diego, San Diego, CA, USA.
  • Jaffee EM; Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
  • Tamayo P; Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD, USA.
  • Mesirov JP; Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
  • Reich M; Department of Neuroscience, Johns Hopkins University, Baltimore, MD, USA.
  • Fertig EJ; Kavli Neurodiscovery Institute, Johns Hopkins University, Baltimore, MD, USA.
  • Stein-O'Brien GL; Single Cell Training and Analysis Center, Johns Hopkins University, Baltimore, MD, USA.
Nat Protoc ; 18(12): 3690-3731, 2023 Dec.
Article in En | MEDLINE | ID: mdl-37989764
ABSTRACT
Non-negative matrix factorization (NMF) is an unsupervised learning method well suited to high-throughput biology. However, inferring biological processes from an NMF result still requires additional post hoc statistics and annotation for interpretation of learned features. Here, we introduce a suite of computational tools that implement NMF and provide methods for accurate and clear biological interpretation and analysis. A generalized discussion of NMF covering its benefits, limitations and open questions is followed by four procedures for the Bayesian NMF algorithm Coordinated Gene Activity across Pattern Subsets (CoGAPS). Each procedure will demonstrate NMF analysis to quantify cell state transitions in a public domain single-cell RNA-sequencing dataset. The first demonstrates PyCoGAPS, our new Python implementation that enhances runtime for large datasets, and the second allows its deployment in Docker. The third procedure steps through the same single-cell NMF analysis using our R CoGAPS interface. The fourth introduces a beginner-friendly CoGAPS platform using GenePattern Notebook, aimed at users with a working conceptual knowledge of data analysis but without a basic proficiency in the R or Python programming language. We also constructed a user-facing website to serve as a central repository for information and instructional materials about CoGAPS and its application programming interfaces. The expected timing to setup the packages and conduct a test run is around 15 min, and an additional 30 min to conduct analyses on a precomputed result. The expected runtime on the user's desired dataset can vary from hours to days depending on factors such as dataset size or input parameters.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Algorithms / Programming Languages Language: En Journal: Nat Protoc Year: 2023 Document type: Article Affiliation country: Estados Unidos

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Algorithms / Programming Languages Language: En Journal: Nat Protoc Year: 2023 Document type: Article Affiliation country: Estados Unidos