RESUMO
A fundamental task in single-cell RNA-seq (scRNA-seq) analysis is the identification of transcriptionally distinct groups of cells. Numerous methods have been proposed for this problem, with a recent focus on methods for the cluster analysis of ultralarge scRNA-seq data sets produced by droplet-based sequencing technologies. Most existing methods rely on a sampling step to bridge the gap between algorithm scalability and volume of the data. Ignoring large parts of the data, however, often yields inaccurate groupings of cells and risks overlooking rare cell types. We propose method Specter that adopts and extends recent algorithmic advances in (fast) spectral clustering. In contrast to methods that cluster a (random) subsample of the data, we adopt the idea of landmarks that are used to create a sparse representation of the full data from which a spectral embedding can then be computed in linear time. We exploit Specter's speed in a cluster ensemble scheme that achieves a substantial improvement in accuracy over existing methods and identifies rare cell types with high sensitivity. Its linear-time complexity allows Specter to scale to millions of cells and leads to fast computation times in practice. Furthermore, on CITE-seq data that simultaneously measures gene and protein marker expression, we show that Specter is able to use multimodal omics measurements to resolve subtle transcriptomic differences between subpopulations of cells.
Assuntos
Análise por Conglomerados , Perfilação da Expressão Gênica , RNA-Seq , Análise de Célula Única , AlgoritmosRESUMO
MOTIVATION: Alternative splicing (AS) of introns from pre-mRNA produces diverse sets of transcripts across cell types and tissues, but is also dysregulated in many diseases. Alignment-free computational methods have greatly accelerated the quantification of mRNA transcripts from short RNA-seq reads, but they inherently rely on a catalog of known transcripts and might miss novel, disease-specific splicing events. By contrast, alignment of reads to the genome can effectively identify novel exonic segments and introns. Event-based methods then count how many reads align to predefined features. However, an alignment is more expensive to compute and constitutes a bottleneck in many AS analysis methods. RESULTS: Here, we propose fortuna, a method that guesses novel combinations of annotated splice sites to create transcript fragments. It then pseudoaligns reads to fragments using kallisto and efficiently derives counts of the most elementary splicing units from kallisto's equivalence classes. These counts can be directly used for AS analysis or summarized to larger units as used by other widely applied methods. In experiments on synthetic and real data, fortuna was around 7× faster than traditional align and count approaches, and was able to analyze almost 300 million reads in just 15 min when using four threads. It mapped reads containing mismatches more accurately across novel junctions and found more reads supporting aberrant splicing events in patients with autism spectrum disorder than existing methods. We further used fortuna to identify novel, tissue-specific splicing events in Drosophila. AVAILABILITY AND IMPLEMENTATION: fortuna source code is available at https://github.com/canzarlab/fortuna.
Assuntos
Transtorno do Espectro Autista , Humanos , Análise de Sequência de RNA/métodos , Splicing de RNA , Processamento Alternativo , SoftwareRESUMO
Psychiatric disorders are a collection of heterogeneous mental disorders arising from a contribution of genetic and environmental insults, many of which molecularly converge on transcriptional dysregulation, resulting in altered synaptic functions. The underlying mechanisms linking the genetic lesion and functional phenotypes remain largely unknown. Patient iPSC-derived neurons with a rare frameshift DISC1 (Disrupted-in-schizophrenia 1) mutation have previously been shown to exhibit aberrant gene expression and deficits in synaptic functions. How DISC1 regulates gene expression is largely unknown. Here we show that Activating Transcription Factor 4 (ATF4), a DISC1 binding partner, is more abundant in the nucleus of DISC1 mutant human neurons and exhibits enhanced binding to a collection of dysregulated genes. Functionally, overexpressing ATF4 in control neurons recapitulates deficits seen in DISC1 mutant neurons, whereas transcriptional and synaptic deficits are rescued in DISC1 mutant neurons with CRISPR-mediated heterozygous ATF4 knockout. By solving the high-resolution atomic structure of the DISC1-ATF4 complex, we show that mechanistically, the mutation of DISC1 disrupts normal DISC1-ATF4 interaction, and results in excessive ATF4 binding to DNA targets and deregulated gene expression. Together, our study identifies the molecular and structural basis of an DISC1-ATF4 interaction underlying transcriptional and synaptic dysregulation in an iPSC model of mental disorders.
Assuntos
Células-Tronco Pluripotentes Induzidas , Transtornos Mentais , Fator 4 Ativador da Transcrição/genética , Humanos , Proteínas do Tecido Nervoso/genética , NeurôniosRESUMO
Kenny-Caffey syndrome (KCS) and the similar but more severe osteocraniostenosis (OCS) are genetic conditions characterized by impaired skeletal development with small and dense bones, short stature, and primary hypoparathyroidism with hypocalcemia. We studied five individuals with KCS and five with OCS and found that all of them had heterozygous mutations in FAM111A. One mutation was identified in four unrelated individuals with KCS, and another one was identified in two unrelated individuals with OCS; all occurred de novo. Thus, OCS and KCS are allelic disorders of different severity. FAM111A codes for a 611 amino acid protein with homology to trypsin-like peptidases. Although FAM111A has been found to bind to the large T-antigen of SV40 and restrict viral replication, its native function is unknown. Molecular modeling of FAM111A shows that residues affected by KCS and OCS mutations do not map close to the active site but are clustered on a segment of the protein and are at, or close to, its outer surface, suggesting that the pathogenesis involves the interaction with as yet unidentified partner proteins rather than impaired catalysis. FAM111A appears to be crucial to a pathway that governs parathyroid hormone production, calcium homeostasis, and skeletal development and growth.
Assuntos
Anormalidades Múltiplas/genética , Doenças do Desenvolvimento Ósseo/genética , Anormalidades Craniofaciais/genética , Nanismo/genética , Hiperostose Cortical Congênita/genética , Hipocalcemia/genética , Hipoparatireoidismo/genética , Receptores Virais/genética , Anormalidades Múltiplas/diagnóstico por imagem , Anormalidades Múltiplas/mortalidade , Anormalidades Múltiplas/patologia , Adolescente , Adulto , Doenças do Desenvolvimento Ósseo/mortalidade , Doenças do Desenvolvimento Ósseo/patologia , Criança , Anormalidades Craniofaciais/mortalidade , Anormalidades Craniofaciais/patologia , Nanismo/diagnóstico por imagem , Nanismo/mortalidade , Estudos de Associação Genética , Heterozigoto , Humanos , Hiperostose Cortical Congênita/diagnóstico por imagem , Hiperostose Cortical Congênita/mortalidade , Hipocalcemia/diagnóstico por imagem , Hipocalcemia/mortalidade , Hipoparatireoidismo/diagnóstico por imagem , Hipoparatireoidismo/mortalidade , Lactente , Recém-Nascido , Masculino , Mutação de Sentido Incorreto , Hormônio Paratireóideo/deficiência , RadiografiaRESUMO
Interaction of malignancies with tissue-specific immune cells has gained interest for prognosis and intervention of emerging immunotherapies. We analyzed bone marrow T cells (bmT) as tumor-infiltrating lymphocytes in pediatric precursor-B cell acute lymphoblastic leukemia (ALL). Based on data from 100 patients, we show that ALL is associated with late-stage CD4+ phenotype and loss of early CD8+ T cells. The inhibitory exhaustion marker TIM-3 on CD4+ bmT increased relapse risk (RFS = 94.6/70.3%) confirmed by multivariate analysis. The hazard ratio of TIM-3 expression nearly reached the hazard ratio of MRD (7.1 vs. 8.0) indicating that patients with a high frequency of TIM-3+CD4+ bone marrow T cells at initial diagnosis have a 7.1-fold increased risk to develop ALL relapse. Comparison of wild type primary T cells to CRISPR/Cas9-mediated TIM-3 knockout and TIM-3 overexpression confirmed the negative effect of TIM-3 on T cell responses against ALL. TIM-3+CD4+ bmT are increased in ALL overexpressing CD200, that leads to dysfunctional antileukemic T cell responses. In conclusion, TIM-3-mediated interaction between bmT and leukemia cells is shown as a strong risk factor for relapse in pediatric B-lineage ALL. CD200/TIM-3-signaling, rather than PD-1/PD-L1, is uncovered as a mechanism of T cell dysfunction in ALL with major implication for future immunotherapies.
Assuntos
Células da Medula Óssea/imunologia , Antígenos CD4/imunologia , Receptor Celular 2 do Vírus da Hepatite A/imunologia , Leucemia-Linfoma Linfoblástico de Células Precursoras B/imunologia , Adolescente , Biomarcadores Tumorais , Criança , Pré-Escolar , Feminino , Humanos , Lactente , Masculino , Leucemia-Linfoma Linfoblástico de Células Precursoras B/patologia , Prognóstico , Recidiva , Fatores de RiscoRESUMO
N6-methyladenosine (m6A) modification of mRNA is emerging as a vital mechanism regulating RNA function. Here, we show that fragile X mental retardation protein (FMRP) reads m6A to promote nuclear export of methylated mRNA targets during neural differentiation. Fmr1 knockout (KO) mice show delayed neural progenitor cell cycle progression and extended maintenance of proliferating neural progenitors into postnatal stages, phenocopying methyltransferase Mettl14 conditional KO (cKO) mice that have no m6A modification. RNA-seq and m6A-seq reveal that both Mettl14cKO and Fmr1KO lead to the nuclear retention of m6A-modified FMRP target mRNAs regulating neural differentiation, indicating that both m6A and FMRP are required for the nuclear export of methylated target mRNAs. FMRP preferentially binds m6A-modified RNAs to facilitate their nuclear export through CRM1. The nuclear retention defect can be mitigated by wild-type but not nuclear export-deficient FMRP, establishing a critical role for FMRP in mediating m6A-dependent mRNA nuclear export during neural differentiation.
Assuntos
Adenosina/análogos & derivados , Diferenciação Celular , Proteína do X Frágil da Deficiência Intelectual/metabolismo , Neurônios/citologia , Neurônios/metabolismo , Transporte de RNA , Transporte Ativo do Núcleo Celular , Adenosina/metabolismo , Animais , Animais Recém-Nascidos , Ciclo Celular , Proliferação de Células , Córtex Cerebral/citologia , Deleção de Genes , Carioferinas/metabolismo , Camundongos Knockout , Células-Tronco Neurais/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Receptores Citoplasmáticos e Nucleares/metabolismo , Proteína Exportina 1RESUMO
Leucine-rich repeat (LRR) domains are evolutionarily conserved in proteins that function in development and immunity. Here we report strict exonic modularity of LRR domains of several human gene families, which is a precondition for alternative splicing (AS). We provide evidence for AS of LRR domain within several Nod-like receptors, most prominently the inflammasome sensor NLRP3. Human NLRP3, but not mouse NLRP3, is expressed as two major isoforms, the full-length variant and a variant lacking exon 5. Moreover, NLRP3 AS is stochastically regulated, with NLRP3 ∆ exon 5 lacking the interaction surface for NEK7 and hence loss of activity. Our data thus reveals unexpected regulatory roles of AS through differential utilization of LRRs modules in vertebrate innate immunity.