RESUMEN
Image-based cell profiling is a powerful tool that compares perturbed cell populations by measuring thousands of single-cell features and summarizing them into profiles. Typically a sample is represented by averaging across cells, but this fails to capture the heterogeneity within cell populations. We introduce CytoSummaryNet: a Deep Sets-based approach that improves mechanism of action prediction by 30-68% in mean average precision compared to average profiling on a public dataset. CytoSummaryNet uses self-supervised contrastive learning in a multiple-instance learning framework, providing an easier-to-apply method for aggregating single-cell feature data than previously published strategies. Interpretability analysis suggests that the model achieves this improvement by downweighting small mitotic cells or those with debris and prioritizing large uncrowded cells. The approach requires only perturbation labels for training, which are readily available in all cell profiling datasets. CytoSummaryNet offers a straightforward post-processing step for single-cell profiles that can significantly boost retrieval performance on image-based profiling datasets.
RESUMEN
Voltage imaging enables high-throughput investigation of neuronal activity, yet its utility is often constrained by a low signal-to-noise ratio (SNR). Conventional denoising algorithms, such as those based on matrix factorization, impose limiting assumptions about the noise process and the spatiotemporal structure of the signal. While deep learning based denoising techniques offer greater adaptability, existing approaches fail to fully exploit the fast temporal dynamics and unique short- and long-range dependencies within voltage imaging datasets. Here, we introduce CellMincer, a novel self-supervised deep learning method designed specifically for denoising voltage imaging datasets. CellMincer operates on the principle of masking and predicting sparse sets of pixels across short temporal windows and conditions the denoiser on precomputed spatiotemporal auto-correlations to effectively model long-range dependencies without the need for large temporal denoising contexts. We develop and utilize a physics-based simulation framework to generate realistic datasets for rigorous hyperparameter optimization and ablation studies, highlighting the key role of conditioning the denoiser on precomputed spatiotemporal auto-correlations to achieve 3-fold further reduction in noise. Comprehensive benchmarking on both simulated and real voltage imaging datasets, including those with paired patch-clamp electrophysiology (EP) as ground truth, demonstrates CellMincer's state-of-the-art performance. It achieves substantial noise reduction across the entire frequency spectrum, enhanced detection of subthreshold events, and superior cross-correlation with ground-truth EP recordings. Finally, we demonstrate how CellMincer's addition to a typical voltage imaging data analysis workflow improves neuronal segmentation, peak detection, and ultimately leads to significantly enhanced separation of functional phenotypes.
RESUMEN
Single-cell transcriptomics, in conjunction with genetic and compound perturbations, offers a robust approach for exploring cellular behaviors in diverse contexts. Such experiments allow uncovering cell-state-specific responses to perturbations, a crucial aspect in unraveling the intricate molecular mechanisms governing cellular behavior and potentially discovering novel regulatory pathways and therapeutic targets. However, prevailing computational methods predominantly focus on predicting average cellular responses, disregarding the inherent response heterogeneity associated with cell state diversity. In this study, we present CellCap, a deep generative model designed for the end-to-end analysis of single-cell perturbation experiments. CellCap employs sparse dictionary learning in a latent space to deconstruct cell-state-specific perturbation responses into a set of transcriptional response programs. These programs are then utilized by each perturbation condition and each cell at varying degrees. The incorporation of specific model design choices, such as dot-product cross-attention between cell states and response programs, along with a linearly-decoded latent space, underlay the interpretation power of CellCap. We evaluate CellCap's model interpretability through multiple simulated scenarios and apply it to two real single-cell perturbation datasets. These datasets feature either heterogeneous cellular populations or a complex experimental setup. Our results demonstrate that CellCap successfully uncovers the relationship between cell state and perturbation response, unveiling novel insights overlooked in previous analyses. The model's interpretability, coupled with its effectiveness in capturing heterogeneous responses, positions CellCap as a valuable tool for advancing our understanding of cellular behaviors in the context of perturbation experiments.
RESUMEN
Measuring the phenotypic effect of treatments on cells through imaging assays is an efficient and powerful way of studying cell biology, and requires computational methods for transforming images into quantitative data. Here, we present an improved strategy for learning representations of treatment effects from high-throughput imaging, following a causal interpretation. We use weakly supervised learning for modeling associations between images and treatments, and show that it encodes both confounding factors and phenotypic features in the learned representation. To facilitate their separation, we constructed a large training dataset with images from five different studies to maximize experimental diversity, following insights from our causal analysis. Training a model with this dataset successfully improves downstream performance, and produces a reusable convolutional network for image-based profiling, which we call Cell Painting CNN. We evaluated our strategy on three publicly available Cell Painting datasets, and observed that the Cell Painting CNN improves performance in downstream analysis up to 30% with respect to classical features, while also being more computationally efficient.
Asunto(s)
Redes Neurales de la ComputaciónRESUMEN
Full-length RNA-sequencing methods using long-read technologies can capture complete transcript isoforms, but their throughput is limited. We introduce multiplexed arrays isoform sequencing (MAS-ISO-seq), a technique for programmably concatenating complementary DNAs (cDNAs) into molecules optimal for long-read sequencing, increasing the throughput >15-fold to nearly 40 million cDNA reads per run on the Sequel IIe sequencer. When applied to single-cell RNA sequencing of tumor-infiltrating T cells, MAS-ISO-seq demonstrated a 12- to 32-fold increase in the discovery of differentially spliced genes.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Isoformas de ARN , ADN Complementario/genética , Isoformas de ARN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Isoformas de Proteínas/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma , Perfilación de la Expresión Génica/métodos , ARN/genéticaRESUMEN
Background: Despite the critical role of the cardiovascular system, our understanding of its cellular and transcriptional diversity remains limited. We therefore sought to characterize the cellular composition, phenotypes, molecular pathways, and communication networks between cell types at the tissue and sub-tissue level across the cardiovascular system of the healthy Wistar rat, an important model in preclinical cardiovascular research. We obtained high quality tissue samples under controlled conditions that reveal a level of cellular detail so far inaccessible in human studies. Methods and Results: We performed single nucleus RNA-sequencing in 78 samples in 10 distinct regions including the four chambers of the heart, ventricular septum, sinoatrial node, atrioventricular node, aorta, pulmonary artery, and pulmonary veins (PV), which produced an aggregate map of 505,835 nuclei. We identified 26 distinct cell types and additional subtypes, including a number of rare cell types such as PV cardiomyocytes and non-myelinating Schwann cells (NMSCs), and unique groups of vascular smooth muscle cells (VSMCs), endothelial cells (ECs) and fibroblasts (FBs), which gave rise to a detailed cell type distribution across tissues. We demonstrated differences in the cellular composition across different cardiac regions and tissue-specific differences in transcription for each cell type, highlighting the molecular diversity and complex tissue architecture of the cardiovascular system. Specifically, we observed great transcriptional heterogeneities among ECs and FBs. Importantly, several cell subtypes had a unique regional localization such as a subtype of VSMCs enriched in the large vasculature. We found the cellular makeup of PV tissue is closer to heart tissue than to the large arteries. We further explored the ligand-receptor repertoire across cell clusters and tissues, and observed tissue-enriched cellular communication networks, including heightened Nppa - Npr1/2/3 signaling in the sinoatrial node. Conclusions: Through a large single nucleus sequencing effort encompassing over 500,000 nuclei, we broadened our understanding of cellular transcription in the healthy cardiovascular system. The existence of tissue-restricted cellular phenotypes suggests regional regulation of cardiovascular physiology. The overall conservation in gene expression and molecular pathways across rat and human cell types, together with our detailed transcriptional characterization of each cell type, offers the potential to identify novel therapeutic targets and improve preclinical models of cardiovascular disease.
RESUMEN
Droplet-based single-cell assays, including single-cell RNA sequencing (scRNA-seq), single-nucleus RNA sequencing (snRNA-seq) and cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), generate considerable background noise counts, the hallmark of which is nonzero counts in cell-free droplets and off-target gene expression in unexpected cell types. Such systematic background noise can lead to batch effects and spurious differential gene expression results. Here we develop a deep generative model based on the phenomenology of noise generation in droplet-based assays. The proposed model accurately distinguishes cell-containing droplets from cell-free droplets, learns the background noise profile and provides noise-free quantification in an end-to-end fashion. We implement this approach in the scalable and robust open-source software package CellBender. Analysis of simulated data demonstrates that CellBender operates near the theoretically optimal denoising limit. Extensive evaluations using real datasets and experimental benchmarks highlight enhanced concordance between droplet-based single-cell data and established gene expression patterns, while the learned background noise profile provides evidence of degraded or uncaptured cell types.
Asunto(s)
ARN Nuclear Pequeño , Programas Informáticos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodosRESUMEN
Copy number variants (CNVs) are major contributors to genetic diversity and disease. While standardized methods, such as the genome analysis toolkit (GATK), exist for detecting short variants, technical challenges have confounded uniform large-scale CNV analyses from whole-exome sequencing (WES) data. Given the profound impact of rare and de novo coding CNVs on genome organization and human disease, we developed GATK-gCNV, a flexible algorithm to discover rare CNVs from sequencing read-depth information, complete with open-source distribution via GATK. We benchmarked GATK-gCNV in 7,962 exomes from individuals in quartet families with matched genome sequencing and microarray data, finding up to 95% recall of rare coding CNVs at a resolution of more than two exons. We used GATK-gCNV to generate a reference catalog of rare coding CNVs in WES data from 197,306 individuals in the UK Biobank, and observed strong correlations between per-gene CNV rates and measures of mutational constraint, as well as rare CNV associations with multiple traits. In summary, GATK-gCNV is a tunable approach for sensitive and specific CNV discovery in WES data, with broad applications.
Asunto(s)
Variaciones en el Número de Copia de ADN , Exoma , Humanos , Exoma/genética , Secuenciación del Exoma , Variaciones en el Número de Copia de ADN/genética , Mapeo Cromosómico , ExonesRESUMEN
3D electron microscopy (EM) connectomics image volumes are surpassing 1 mm3, providing information-dense, multi-scale visualizations of brain circuitry and necessitating scalable analysis techniques. We present SynapseCLR, a self-supervised contrastive learning method for 3D EM data, and use it to extract features of synapses from mouse visual cortex. SynapseCLR feature representations separate synapses by appearance and functionally important structural annotations. We demonstrate SynapseCLR's utility for valuable downstream tasks, including one-shot identification of defective synapse segmentations, dataset-wide similarity-based querying, and accurate imputation of annotations for unlabeled synapses, using manual annotation of only 0.2% of the dataset's synapses. In particular, excitatory versus inhibitory neuronal types can be assigned with >99.8% accuracy to individual synapses and highly truncated neurites, enabling neurite-enhanced connectomics analysis. Finally, we present a data-driven, unsupervised study of synaptic structural variation on the representation manifold, revealing its intrinsic axes of variation and showing that representations contain inhibitory subtype information.
RESUMEN
SARS-CoV-2 distribution and circulation dynamics are not well understood due to challenges in assessing genomic data from tissue samples. We develop experimental and computational workflows for high-depth viral sequencing and high-resolution genomic analyses from formalin-fixed, paraffin-embedded tissues and apply them to 120 specimens from six subjects with fatal COVID-19. To varying degrees, viral RNA is present in extrapulmonary tissues from all subjects. The majority of the 180 viral variants identified within subjects are unique to individual tissue samples. We find more high-frequency (>10%) minor variants in subjects with a longer disease course, with one subject harboring ten such variants, exclusively in extrapulmonary tissues. One tissue-specific high-frequency variant was a nonsynonymous mutation in the furin-cleavage site of the spike protein. Our findings suggest adaptation and/or compartmentalized infection, illuminating the basis of extrapulmonary COVID-19 symptoms and potential for viral reservoirs, and have broad utility for investigating human pathogens.
Asunto(s)
COVID-19 , Humanos , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , Mutación , Glicoproteína de la Espiga del Coronavirus/genética , Glicoproteína de la Espiga del Coronavirus/metabolismoRESUMEN
Ebola virus (EBOV) causes Ebola virus disease (EVD), marked by severe hemorrhagic fever; however, the mechanisms underlying the disease remain unclear. To assess the molecular basis of EVD across time, we performed RNA sequencing on 17 tissues from a natural history study of 21 rhesus monkeys, developing new methods to characterize host-pathogen dynamics. We identified alterations in host gene expression with previously unknown tissue-specific changes, including downregulation of genes related to tissue connectivity. EBOV was widely disseminated throughout the body; using a new, broadly applicable deconvolution method, we found that viral load correlated with increased monocyte presence. Patterns of viral variation between tissues differentiated primary infections from compartmentalized infections, and several variants impacted viral fitness in a EBOV/Kikwit minigenome system, suggesting that functionally significant variants can emerge during early infection. This comprehensive portrait of host-pathogen dynamics in EVD illuminates new features of pathogenesis and establishes resources to study other emerging pathogens.
Asunto(s)
Ebolavirus , Fiebre Hemorrágica Ebola , Fiebres Hemorrágicas Virales , Animales , Fiebre Hemorrágica Ebola/patología , Macaca mulatta , Ebolavirus/genéticaRESUMEN
The molecular underpinnings of organ dysfunction in acute COVID-19 and its potential long-term sequelae are under intense investigation. To shed light on these in the context of liver function, we performed single-nucleus RNA-seq and spatial transcriptomic profiling of livers from 17 COVID-19 decedents. We identified hepatocytes positive for SARS-CoV-2 RNA with an expression phenotype resembling infected lung epithelial cells. Integrated analysis and comparisons with healthy controls revealed extensive changes in the cellular composition and expression states in COVID-19 liver, reflecting hepatocellular injury, ductular reaction, pathologic vascular expansion, and fibrogenesis. We also observed Kupffer cell proliferation and erythrocyte progenitors for the first time in a human liver single-cell atlas, resembling similar responses in liver injury in mice and in sepsis, respectively. Despite the absence of a clinical acute liver injury phenotype, endothelial cell composition was dramatically impacted in COVID-19, concomitantly with extensive alterations and profibrogenic activation of reactive cholangiocytes and mesenchymal cells. Our atlas provides novel insights into liver physiology and pathology in COVID-19 and forms a foundational resource for its investigation and understanding.
RESUMEN
Some individuals with autism spectrum disorder (ASD) carry functional mutations rarely observed in the general population. We explored the genes disrupted by these variants from joint analysis of protein-truncating variants (PTVs), missense variants and copy number variants (CNVs) in a cohort of 63,237 individuals. We discovered 72 genes associated with ASD at false discovery rate (FDR) ≤ 0.001 (185 at FDR ≤ 0.05). De novo PTVs, damaging missense variants and CNVs represented 57.5%, 21.1% and 8.44% of association evidence, while CNVs conferred greatest relative risk. Meta-analysis with cohorts ascertained for developmental delay (DD) (n = 91,605) yielded 373 genes associated with ASD/DD at FDR ≤ 0.001 (664 at FDR ≤ 0.05), some of which differed in relative frequency of mutation between ASD and DD cohorts. The DD-associated genes were enriched in transcriptomes of progenitor and immature neuronal cells, whereas genes showing stronger evidence in ASD were more enriched in maturing neurons and overlapped with schizophrenia-associated genes, emphasizing that these neuropsychiatric disorders may share common pathways to risk.
Asunto(s)
Trastorno del Espectro Autista , Trastorno Autístico , Trastorno del Espectro Autista/genética , Trastorno Autístico/genética , Variaciones en el Número de Copia de ADN/genética , Predisposición Genética a la Enfermedad , Humanos , MutaciónRESUMEN
Repeated emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants with increased fitness underscores the value of rapid detection and characterization of new lineages. We have developed PyR0, a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many nonspike mutations within the nucleocapsid and nonstructural proteins. PyR0 forecasts growth of new lineages from their mutational profile, ranks the fitness of lineages as new sequences become available, and prioritizes mutations of biological and public health concern for functional characterization.
Asunto(s)
COVID-19 , Aptitud Genética , SARS-CoV-2 , Teorema de Bayes , COVID-19/virología , Genoma Viral , Humanos , Mutación , Análisis de Regresión , SARS-CoV-2/genética , Glicoproteína de la Espiga del Coronavirus/química , Glicoproteína de la Espiga del Coronavirus/genéticaRESUMEN
Repeated emergence of SARS-CoV-2 variants with increased fitness necessitates rapid detection and characterization of new lineages. To address this need, we developed PyR 0 , a hierarchical Bayesian multinomial logistic regression model that infers relative prevalence of all viral lineages across geographic regions, detects lineages increasing in prevalence, and identifies mutations relevant to fitness. Applying PyR 0 to all publicly available SARS-CoV-2 genomes, we identify numerous substitutions that increase fitness, including previously identified spike mutations and many non-spike mutations within the nucleocapsid and nonstructural proteins. PyR 0 forecasts growth of new lineages from their mutational profile, identifies viral lineages of concern as they emerge, and prioritizes mutations of biological and public health concern for functional characterization. ONE SENTENCE SUMMARY: A Bayesian hierarchical model of all SARS-CoV-2 viral genomes predicts lineage fitness and identifies associated mutations.
RESUMEN
BACKGROUND: The human heart requires a complex ensemble of specialized cell types to perform its essential function. A greater knowledge of the intricate cellular milieu of the heart is critical to increase our understanding of cardiac homeostasis and pathology. As recent advances in low-input RNA sequencing have allowed definitions of cellular transcriptomes at single-cell resolution at scale, we have applied these approaches to assess the cellular and transcriptional diversity of the nonfailing human heart. METHODS: Microfluidic encapsulation and barcoding was used to perform single nuclear RNA sequencing with samples from 7 human donors, selected for their absence of overt cardiac disease. Individual nuclear transcriptomes were then clustered based on transcriptional profiles of highly variable genes. These clusters were used as the basis for between-chamber and between-sex differential gene expression analyses and intersection with genetic and pharmacologic data. RESULTS: We sequenced the transcriptomes of 287 269 single cardiac nuclei, revealing 9 major cell types and 20 subclusters of cell types within the human heart. Cellular subclasses include 2 distinct groups of resident macrophages, 4 endothelial subtypes, and 2 fibroblast subsets. Comparisons of cellular transcriptomes by cardiac chamber or sex reveal diversity not only in cardiomyocyte transcriptional programs but also in subtypes involved in extracellular matrix remodeling and vascularization. Using genetic association data, we identified strong enrichment for the role of cell subtypes in cardiac traits and diseases. Intersection of our data set with genes on cardiac clinical testing panels and the druggable genome reveals striking patterns of cellular specificity. CONCLUSIONS: Using large-scale single nuclei RNA sequencing, we defined the transcriptional and cellular diversity in the normal human heart. Our identification of discrete cell subtypes and differentially expressed genes within the heart will ultimately facilitate the development of new therapeutics for cardiovascular diseases.
Asunto(s)
Miocardio/citología , Transcripción Genética , Adipocitos/metabolismo , Adulto , Anciano , Fármacos Cardiovasculares/farmacología , Fármacos Cardiovasculares/uso terapéutico , Células Endoteliales/clasificación , Células Endoteliales/metabolismo , Fibroblastos/clasificación , Fibroblastos/metabolismo , Ontología de Genes , Corazón/inervación , Atrios Cardíacos/citología , Cardiopatías/tratamiento farmacológico , Ventrículos Cardíacos/citología , Homeostasis , Humanos , Subgrupos Linfocitarios/metabolismo , Macrófagos/clasificación , Macrófagos/metabolismo , Técnicas Analíticas Microfluídicas , Persona de Mediana Edad , Miocardio/metabolismo , Miocitos Cardíacos/metabolismo , Miocitos del Músculo Liso/metabolismo , Pericitos/metabolismo , RNA-Seq , Caracteres Sexuales , Análisis de la Célula Individual , TranscriptomaRESUMEN
For many years, quasicrystals were observed only as solid-state metallic alloys, yet current research is now actively exploring their formation in a variety of soft materials, including systems of macromolecules, nanoparticles and colloids. Much effort is being invested in understanding the thermodynamic properties of these soft-matter quasicrystals in order to predict and possibly control the structures that form, and hopefully to shed light on the broader yet unresolved general questions of quasicrystal formation and stability. Moreover, the ability to control the self-assembly of soft quasicrystals may contribute to the development of novel photonics or other applications based on self-assembled metamaterials. Here a path is followed, leading to quantitative stability predictions, that starts with a model developed two decades ago to treat the formation of multiple-scale quasiperiodic Faraday waves (standing wave patterns in vibrating fluid surfaces) and which was later mapped onto systems of soft particles, interacting via multiple-scale pair potentials. The article reviews, and substantially expands, the quantitative predictions of these models, while correcting a few discrepancies in earlier calculations, and presents new analytical methods for treating the models. In so doing, a number of new stable quasicrystalline structures are found with octagonal, octadecagonal and higher-order symmetries, some of which may, it is hoped, be observed in future experiments.
RESUMEN
CTLA-4 immune checkpoint blockade is clinically effective in a subset of patients with metastatic melanoma. We identify a subcluster of MAGE-A cancer-germline antigens, located within a narrow 75 kb region of chromosome Xq28, that predicts resistance uniquely to blockade of CTLA-4, but not PD-1. We validate this gene expression signature in an independent anti-CTLA-4-treated cohort and show its specificity to the CTLA-4 pathway with two independent anti-PD-1-treated cohorts. Autophagy, a process critical for optimal anti-cancer immunity, has previously been shown to be suppressed by the MAGE-TRIM28 ubiquitin ligase in vitro. We now show that the expression of the key autophagosome component LC3B and other activators of autophagy are negatively associated with MAGE-A protein levels in human melanomas, including samples from patients with resistance to CTLA-4 blockade. Our findings implicate autophagy suppression in resistance to CTLA-4 blockade in melanoma, suggesting exploitation of autophagy induction for potential therapeutic synergy with CTLA-4 inhibitors.
Asunto(s)
Antígeno CTLA-4/genética , Antígeno CTLA-4/inmunología , Epigénesis Genética , Mutación de Línea Germinal , Neoplasias/genética , Neoplasias/inmunología , Animales , Anticuerpos Monoclonales/uso terapéutico , Antígenos de Neoplasias/genética , Antígenos de Neoplasias/inmunología , Autofagia , Línea Celular Tumoral , Metilación de ADN , Femenino , Perfilación de la Expresión Génica , Humanos , Inmunoterapia , Ipilimumab/farmacología , Masculino , Melanoma/genética , Melanoma/inmunología , Antígenos Específicos del Melanoma/genética , Antígenos Específicos del Melanoma/inmunología , Ratones , Ratones Transgénicos , Neoplasias Cutáneas/genética , Neoplasias Cutáneas/inmunologíaRESUMEN
We study the quench dynamics of a two-component ultracold Fermi gas from the weak into the strong interaction regime, where the short time dynamics are governed by the exponential growth rate of unstable collective modes. We obtain an effective interaction that takes into account both Pauli blocking and the energy dependence of the scattering amplitude near a Feshbach resonance. Using this interaction we analyze the competing instabilities towards Stoner ferromagnetism and pairing.