Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Nature ; 582(7810): 100-103, 2020 06.
Artículo en Inglés | MEDLINE | ID: mdl-32461694

RESUMEN

Cancers develop as a result of driver mutations1,2 that lead to clonal outgrowth and the evolution of disease3,4. The discovery and functional characterization of individual driver mutations are central aims of cancer research, and have elucidated myriad phenotypes5 and therapeutic vulnerabilities6. However, the serial genetic evolution of mutant cancer genes7,8 and the allelic context in which they arise is poorly understood in both common and rare cancer genes and tumour types. Here we find that nearly one in four human tumours contains a composite mutation of a cancer-associated gene, defined as two or more nonsynonymous somatic mutations in the same gene and tumour. Composite mutations are enriched in specific genes, have an elevated rate of use of less-common hotspot mutations acquired in a chronology driven in part by oncogenic fitness, and arise in an allelic configuration that reflects context-specific selective pressures. cis-acting composite mutations are hypermorphic in some genes in which dosage effects predominate (such as TERT), whereas they lead to selection of function in other genes (such as TP53). Collectively, composite mutations are driver alterations that arise from context- and allele-specific selective pressures that are dependent in part on gene and mutation function, and which lead to complex-often neomorphic-functions of biological and therapeutic importance.


Asunto(s)
Carcinogénesis/genética , Modelos Genéticos , Mutación , Neoplasias/genética , Oncogenes/genética , Alelos , Animales , Femenino , Genes p53/genética , Humanos , Ratones , Selección Genética , Telomerasa/genética
2.
Nature ; 569(7757): 576-580, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-31092926

RESUMEN

Genetic and epigenetic intra-tumoral heterogeneity cooperate to shape the evolutionary course of cancer1. Chronic lymphocytic leukaemia (CLL) is a highly informative model for cancer evolution as it undergoes substantial genetic diversification and evolution after therapy2,3. The CLL epigenome is also an important disease-defining feature4,5, and growing populations of cells in CLL diversify by stochastic changes in DNA methylation known as epimutations6. However, previous studies using bulk sequencing methods to analyse the patterns of DNA methylation were unable to determine whether epimutations affect CLL populations homogeneously. Here, to measure the epimutation rate at single-cell resolution, we applied multiplexed single-cell reduced-representation bisulfite sequencing to B cells from healthy donors and patients with CLL. We observed that the common clonal origin of CLL results in a consistently increased epimutation rate, with low variability in the cell-to-cell epimutation rate. By contrast, variable epimutation rates across healthy B cells reflect diverse evolutionary ages across the trajectory of B cell differentiation, consistent with epimutations serving as a molecular clock. Heritable epimutation information allowed us to reconstruct lineages at high-resolution with single-cell data, and to apply this directly to patient samples. The CLL lineage tree shape revealed earlier branching and longer branch lengths than in normal B cells, reflecting rapid drift after the initial malignant transformation and a greater proliferative history. Integration of single-cell bisulfite sequencing analysis with single-cell transcriptomes and genotyping confirmed that genetic subclones mapped to distinct clades, as inferred solely on the basis of epimutation information. Finally, to examine potential lineage biases during therapy, we profiled serial samples during ibrutinib-associated lymphocytosis, and identified clades of cells that were preferentially expelled from the lymph node after treatment, marked by distinct transcriptional profiles. The single-cell integration of genetic, epigenetic and transcriptional information thus charts the lineage history of CLL and its evolution with therapy.


Asunto(s)
Linaje de la Célula , Epigénesis Genética , Evolución Molecular , Leucemia Linfocítica Crónica de Células B/genética , Leucemia Linfocítica Crónica de Células B/patología , Secuencia de Bases , Relojes Biológicos , Linaje de la Célula/genética , Metilación de ADN , Epigenoma/genética , Regulación Neoplásica de la Expresión Génica , Humanos , Leucemia Linfocítica Crónica de Células B/metabolismo , Tasa de Mutación , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Transcripción Genética
3.
Genome Biol ; 25(1): 35, 2024 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-38273415

RESUMEN

Targeted spatial transcriptomics hold particular promise in analyzing complex tissues. Most such methods, however, measure only a limited panel of transcripts, which need to be selected in advance to inform on the cell types or processes being studied. A limitation of existing gene selection methods is their reliance on scRNA-seq data, ignoring platform effects between technologies. Here we describe gpsFISH, a computational method performing gene selection through optimizing detection of known cell types. By modeling and adjusting for platform effects, gpsFISH outperforms other methods. Furthermore, gpsFISH can incorporate cell type hierarchies and custom gene preferences to accommodate diverse design requirements.


Asunto(s)
Perfilación de la Expresión Génica , Técnicas Genéticas , Análisis de la Célula Individual , Transcriptoma , Análisis de Secuencia de ARN
4.
Nat Commun ; 15(1): 6611, 2024 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-39098889

RESUMEN

Identifying cellular identities is a key use case in single-cell transcriptomics. While machine learning has been leveraged to automate cell annotation predictions for some time, there has been little progress in scaling neural networks to large data sets and in constructing models that generalize well across diverse tissues. Here, we propose scTab, an automated cell type prediction model specific to tabular data, and train it using a novel data augmentation scheme across a large corpus of single-cell RNA-seq observations (22.2 million cells). In this context, we show that cross-tissue annotation requires nonlinear models and that the performance of scTab scales both in terms of training dataset size and model size. Additionally, we show that the proposed data augmentation schema improves model generalization. In summary, we introduce a de novo cell type prediction model for single-cell RNA-seq data that can be trained across a large-scale collection of curated datasets and demonstrate the benefits of using deep learning methods in this paradigm.


Asunto(s)
Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , RNA-Seq/métodos , Aprendizaje Automático , Redes Neurales de la Computación , Transcriptoma , Análisis de Secuencia de ARN/métodos , Animales , Biología Computacional/métodos , Aprendizaje Profundo , Perfilación de la Expresión Génica/métodos , Algoritmos
5.
Nat Biotechnol ; 2024 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-39313646

RESUMEN

Tissue-level and organism-level biological processes often involve the coordinated action of multiple distinct cell types. The recent application of single-cell assays to many individuals should enable the study of how donor-level variation in one cell type is linked to that in other cell types. Here we introduce a computational approach called single-cell interpretable tensor decomposition (scITD) to identify common axes of interindividual variation by considering joint expression variation across multiple cell types. scITD combines expression matrices from each cell type into a higher-order matrix and factorizes the result using the Tucker tensor decomposition. Applying scITD to single-cell RNA-sequencing data on 115 persons with lupus and 83 persons with coronavirus disease 2019, we identify patterns of coordinated cellular activity linked to disease severity and specific phenotypes, such as lupus nephritis. scITD results also implicate specific signaling pathways likely mediating coordination between cell types. Overall, scITD offers a tool for understanding the covariation of cell states across individuals, which can yield insights into the complex processes that define and stratify disease.

6.
bioRxiv ; 2023 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-37873298

RESUMEN

Identifying cellular identities (both novel and well-studied) is one of the key use cases in single-cell transcriptomics. While supervised machine learning has been leveraged to automate cell annotation predictions for some time, there has been relatively little progress both in scaling neural networks to large data sets and in constructing models that generalize well across diverse tissues and biological contexts up to whole organisms. Here, we propose scTab, an automated, feature-attention-based cell type prediction model specific to tabular data, and train it using a novel data augmentation scheme across a large corpus of single-cell RNA-seq observations (22.2 million human cells in total). In addition, scTab leverages deep ensembles for uncertainty quantification. Moreover, we account for ontological relationships between labels in the model evaluation to accommodate for differences in annotation granularity across datasets. On this large-scale corpus, we show that cross-tissue annotation requires nonlinear models and that the performance of scTab scales in terms of training dataset size as well as model size - demonstrating the advantage of scTab over current state-of-the-art linear models in this context. Additionally, we show that the proposed data augmentation schema improves model generalization. In summary, we introduce a de novo cell type prediction model for single-cell RNA-seq data that can be trained across a large-scale collection of curated datasets from a diverse selection of human tissues and demonstrate the benefits of using deep learning methods in this paradigm. Our codebase, training data, and model checkpoints are publicly available at https://github.com/theislab/scTab to further enable rigorous benchmarks of foundation models for single-cell RNA-seq data.

7.
Nat Biotechnol ; 41(3): 417-426, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36163550

RESUMEN

Genome instability and aberrant alterations of transcriptional programs both play important roles in cancer. Single-cell RNA sequencing (scRNA-seq) has the potential to investigate both genetic and nongenetic sources of tumor heterogeneity in a single assay. Here we present a computational method, Numbat, that integrates haplotype information obtained from population-based phasing with allele and expression signals to enhance detection of copy number variations from scRNA-seq. Numbat exploits the evolutionary relationships between subclones to iteratively infer single-cell copy number profiles and tumor clonal phylogeny. Analysis of 22 tumor samples, including multiple myeloma, gastric, breast and thyroid cancers, shows that Numbat can reconstruct the tumor copy number profile and precisely identify malignant cells in the tumor microenvironment. We identify genetic subpopulations with transcriptional signatures relevant to tumor progression and therapy resistance. Numbat requires neither sample-matched DNA data nor a priori genotyping, and is applicable to a wide range of experimental settings and cancer types.


Asunto(s)
Mieloma Múltiple , Transcriptoma , Humanos , Transcriptoma/genética , Variaciones en el Número de Copia de ADN/genética , Haplotipos/genética , Filogenia , Análisis de la Célula Individual/métodos , Microambiente Tumoral
8.
bioRxiv ; 2023 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-36993340

RESUMEN

Targeted spatial transcriptomics hold particular promise in analysis of complex tissues. Most such methods, however, measure only a limited panel of transcripts, which need to be selected in advance to inform on the cell types or processes being studied. A limitation of existing gene selection methods is that they rely on scRNA-seq data, ignoring platform effects between technologies. Here we describe gpsFISH, a computational method to perform gene selection through optimizing detection of known cell types. By modeling and adjusting for platform effects, gpsFISH outperforms other methods. Furthermore, gpsFISH can incorporate cell type hierarchies and custom gene preferences to accommodate diverse design requirements.

9.
Viruses ; 12(12)2020 12 10.
Artículo en Inglés | MEDLINE | ID: mdl-33322070

RESUMEN

Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus-host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Metagenómica/métodos , Virus/genética , Biología Computacional/métodos , Variación Genética , Genoma Viral , Interacciones Huésped-Patógeno , Humanos , Interfaz Usuario-Computador , Proteínas Virales/genética , Proteínas Virales/metabolismo , Virus/metabolismo , Navegador Web
10.
F1000Res ; 8: 1751, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-34386196

RESUMEN

In March 2019, 45 scientists and software engineers from around the world converged at the University of California, Santa Cruz for the first pangenomics codeathon. The purpose of the meeting was to propose technical specifications and standards for a usable human pangenome as well as to build relevant tools for genome graph infrastructures. During the meeting, the group held several intense and productive discussions covering a diverse set of topics, including advantages of graph genomes over a linear reference representation, design of new methods that can leverage graph-based data structures, and novel visualization and annotation approaches for pangenomes. Additionally, the participants self-organized themselves into teams that worked intensely over a three-day period to build a set of pipelines and tools for specific pangenomic applications. A summary of the questions raised and the tools developed are reported in this manuscript.

11.
F1000Res ; 7: 1391, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30613392

RESUMEN

Genome graphs are emerging as an important novel approach to the analysis of high-throughput human sequencing data. By explicitly representing genetic variants and alternative haplotypes in a mappable data structure, they can enable the improved analysis of structurally variable and hyperpolymorphic regions of the genome. In most existing approaches, graphs are constructed from variant call sets derived from short-read sequencing. As long-read sequencing becomes more cost-effective and enables de novo assembly for increasing numbers of whole genomes, a method for the direct construction of a genome graph from sets of assembled human genomes would be desirable. Such assembly-based genome graphs would encompass the wide spectrum of genetic variation accessible to long-read-based de novo assembly, including large structural variants and divergent haplotypes. Here we present NovoGraph, a method for the construction of a human genome graph directly from a set of de novo assemblies. NovoGraph constructs a genome-wide multiple sequence alignment of all input contigs and creates a graph by merging the input sequences at positions that are both homologous and sequence-identical. NovoGraph outputs resulting graphs in VCF format that can be loaded into third-party genome graph toolkits. To demonstrate NovoGraph, we construct a genome graph with 23,478,835 variant sites and 30,582,795 variant alleles from de novo assemblies of seven ethnically diverse human genomes (AK1, CHM1, CHM13, HG003, HG004, HX1, NA19240). Initial evaluations show that mapping against the constructed graph reduces the average mismatch rate of reads from sample NA12878 by approximately 0.2%, albeit at a slightly increased rate of reads that remain unmapped.


Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Haplotipos , Humanos , Alineación de Secuencia , Análisis de Secuencia de ADN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA