RESUMO
Tandem repeats of simple sequence motifs, also known as microsatellites, are abundant in the genome. Because their repeat structure makes replication error-prone, variant microsatellite lengths are often generated during germline and other somatic expansions. As such, microsatellite length variations can serve as markers for cancer. However, accurate error-free measurement of microsatellite lengths is difficult with current methods precisely because of this high error rate during amplification. We have solved this problem by using partial mutagenesis to disrupt enough of the repeat structure of initial templates so that their sequence lengths replicate faithfully. In this work, we use bisulfite mutagenesis to convert a C to a U, later read as T. Compared to untreated templates, we achieve three orders of magnitude reduction in the error rate per round of replication. By requiring agreement from two independent first copies of an initial template, we reach error rates below one in a million. We apply this method to a thousand microsatellite loci from the human genome, revealing microsatellite length distributions not observable without mutagenesis.
Assuntos
Genoma Humano , Repetições de Microssatélites , Mutagênese Sítio-Dirigida , Humanos , Repetições de Microssatélites/genética , Mutagênese Sítio-Dirigida/métodosRESUMO
Short-read sequencers provide highly accurate reads at very low cost. Unfortunately, short reads are often inadequate for important applications such as assembly in complex regions or phasing across distant heterozygous sites. In this study, we describe novel bench protocols and algorithms to obtain haplotype-phased sequence assemblies with ultra-low error for regions 10 kb and longer using short reads only. We accomplish this by imprinting each template strand from a target region with a dense and unique mutation pattern. The mutation process randomly and independently converts â¼50% of cytosines to uracils. Sequencing libraries are made from both mutated and unmutated templates. Using de Bruijn graphs and paired-end read information, we assemble each mutated template and use the unmutated library to correct the mutated bases. Templates are partitioned into two or more haplotypes, and the final haplotypes are assembled and corrected for residual template mutations and PCR errors. With sufficient template coverage, the final assemblies have per-base error rates below 10-9. We demonstrate this method on a four-member nuclear family, correctly assembling and phasing three genomic intervals, including the highly polymorphic HLA-B gene.
Assuntos
Genoma , Genômica , Algoritmos , Antígenos HLA-B , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutagênese , Análise de Sequência de DNA/métodosRESUMO
We show the use of 5'-Acrydite oligonucleotides to copolymerize single-cell DNA or RNA into balls of acrylamide gel (BAGs). Combining this step with split-and-pool techniques for creating barcodes yields a method with advantages in cost and scalability, depth of coverage, ease of operation, minimal cross-contamination, and efficient use of samples. We perform DNA copy number profiling on mixtures of cell lines, nuclei from frozen prostate tumors, and biopsy washes. As applied to RNA, the method has high capture efficiency of transcripts and sufficient consistency to clearly distinguish the expression patterns of cell lines and individual nuclei from neurons dissected from the mouse brain. By using varietal tags (UMIs) to achieve sequence error correction, we show extremely low levels of cross-contamination by tracking source-specific SNVs. The method is readily modifiable, and we will discuss its adaptability and diverse applications.
Assuntos
Acrilamida , Ácidos Nucleicos , Análise de Célula Única/métodos , Acrilamida/química , DNA , Contaminação por DNA , Variações do Número de Cópias de DNA , Dosagem de Genes , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Biblioteca Gênica , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/patologia , Ácidos Nucleicos/química , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência com Séries de Oligonucleotídeos/normas , Polimerização , RNA , Análise de Célula Única/normasRESUMO
Cancers are highly heterogeneous and contain many passenger and driver mutations. To functionally identify tumor suppressor genes relevant to human cancer, we compiled pools of short hairpin RNAs (shRNAs) targeting the mouse orthologs of genes recurrently deleted in a series of human hepatocellular carcinomas and tested their ability to promote tumorigenesis in a mosaic mouse model. In contrast to randomly selected shRNA pools, many deletion-specific pools accelerated hepatocarcinogenesis in mice. Through further analysis, we identified and validated 13 tumor suppressor genes, 12 of which had not been linked to cancer before. One gene, XPO4, encodes a nuclear export protein whose substrate, EIF5A2, is amplified in human tumors, is required for proliferation of XPO4-deficient tumor cells, and promotes hepatocellular carcinoma in mice. Our results establish the feasibility of in vivo RNAi screens and illustrate how combining cancer genomics, RNA interference, and mosaic mouse models can facilitate the functional annotation of the cancer genome.
Assuntos
Carcinoma Hepatocelular/genética , Genes Supressores de Tumor , Genômica , Neoplasias Hepáticas/genética , Interferência de RNA , Animais , Humanos , Carioferinas/genética , Carioferinas/metabolismo , Camundongos , Fatores de Iniciação de Peptídeos/genética , RNA não Traduzido/genética , Proteínas de Ligação a RNA/genética , Proteína Smad3/metabolismo , Fator de Iniciação de Tradução Eucariótico 5ARESUMO
Measuring minimal residual disease in cancer has applications for prognosis, monitoring treatment and detection of recurrence. Simple sequence-based methods to detect nucleotide substitution variants have error rates (about 10-3) that limit sensitive detection. We developed and characterized the performance of MASQ (multiplex accurate sensitive quantitation), a method with an error rate below 10-6. MASQ counts variant templates accurately in the presence of millions of host genomes by using tags to identify each template and demanding consensus over multiple reads. Since the MASQ protocol multiplexes 50 target loci, we can both integrate signal from multiple variants and capture subclonal response to treatment. Compared to existing methods for variant detection, MASQ achieves an excellent combination of sensitivity, specificity and yield. We tested MASQ in a pilot study in acute myeloid leukemia (AML) patients who entered complete remission. We detect leukemic variants in the blood and bone marrow samples of all five patients, after induction therapy, at levels ranging from 10-2 to nearly 10-6. We observe evidence of sub-clonal structure and find higher target variant frequencies in patients who go on to relapse, demonstrating the potential for MASQ to quantify residual disease in AML.
Assuntos
Leucemia Mieloide Aguda/genética , Algoritmos , Genômica/métodos , Humanos , Leucemia Mieloide Aguda/terapia , Mutação , Neoplasia Residual , Projetos Piloto , Recidiva , Indução de Remissão , Sequenciamento Completo do GenomaRESUMO
In individuals with autism spectrum disorder (ASD), de novo mutations have previously been shown to be significantly correlated with lower IQ but not with the core characteristics of ASD: deficits in social communication and interaction and restricted interests and repetitive patterns of behavior. We extend these findings by demonstrating in the Simons Simplex Collection that damaging de novo mutations in ASD individuals are also significantly and convincingly correlated with measures of impaired motor skills. This correlation is not explained by a correlation between IQ and motor skills. We find that IQ and motor skills are distinctly associated with damaging mutations and, in particular, that motor skills are a more sensitive indicator of mutational severity than is IQ, as judged by mutational type and target gene. We use this finding to propose a combined classification of phenotypic severity: mild (little impairment of either), moderate (impairment mainly to motor skills), and severe (impairment of both IQ and motor skills).
Assuntos
Transtorno do Espectro Autista/genética , Destreza Motora/fisiologia , Criança , Feminino , Genótipo , Humanos , Masculino , MutaçãoRESUMO
Cancer cells frequently depend on chromatin regulatory activities to maintain a malignant phenotype. Here, we show that leukemia cells require the mammalian SWI/SNF chromatin remodeling complex for their survival and aberrant self-renewal potential. While Brg1, an ATPase subunit of SWI/SNF, is known to suppress tumor formation in several cell types, we found that leukemia cells instead rely on Brg1 to support their oncogenic transcriptional program, which includes Myc as one of its key targets. To account for this context-specific function, we identify a cluster of lineage-specific enhancers located 1.7 Mb downstream from Myc that are occupied by SWI/SNF as well as the BET protein Brd4. Brg1 is required at these distal elements to maintain transcription factor occupancy and for long-range chromatin looping interactions with the Myc promoter. Notably, these distal Myc enhancers coincide with a region that is focally amplified in â¼3% of acute myeloid leukemias. Together, these findings define a leukemia maintenance function for SWI/SNF that is linked to enhancer-mediated gene regulation, providing general insights into how cancer cells exploit transcriptional coactivators to maintain oncogenic gene expression programs.
Assuntos
Proteínas de Ligação a DNA/metabolismo , Elementos Facilitadores Genéticos/fisiologia , Regulação Neoplásica da Expressão Gênica , Leucemia Mieloide Aguda/fisiopatologia , Proteínas Proto-Oncogênicas c-myc/genética , Fatores de Transcrição/metabolismo , Linhagem Celular Tumoral , Proliferação de Células , DNA Helicases/genética , DNA Helicases/metabolismo , Proteínas de Ligação a DNA/genética , Elementos Facilitadores Genéticos/genética , Técnicas de Silenciamento de Genes , Humanos , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Regiões Promotoras Genéticas/genética , Ligação Proteica , Proteínas Proto-Oncogênicas c-myc/metabolismo , Fatores de Transcrição/genéticaRESUMO
The identification of the genetic components of autism spectrum disorders (ASDs) has advanced rapidly in recent years, particularly with the demonstration of de novo mutations as an important source of causality. We review these developments in light of genetic models for ASDs. We consider the number of genetic loci that underlie ASDs and the relative contributions from different mutational classes, and we discuss possible mechanisms by which these mutations might lead to dysfunction. We update the two-class risk genetic model for autism, especially in regard to children with high intelligence quotients.
Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Predisposição Genética para Doença/genética , Modelos Genéticos , Mutação , Criança , Transtornos Globais do Desenvolvimento Infantil/psicologia , Feminino , Humanos , Inteligência , Masculino , Fatores SexuaisRESUMO
Whole exome sequencing has proven to be a powerful tool for understanding the genetic architecture of human disease. Here we apply it to more than 2,500 simplex families, each having a child with an autistic spectrum disorder. By comparing affected to unaffected siblings, we show that 13% of de novo missense mutations and 43% of de novo likely gene-disrupting (LGD) mutations contribute to 12% and 9% of diagnoses, respectively. Including copy number variants, coding de novo mutations contribute to about 30% of all simplex and 45% of female diagnoses. Almost all LGD mutations occur opposite wild-type alleles. LGD targets in affected females significantly overlap the targets in males of lower intelligence quotient (IQ), but neither overlaps significantly with targets in males of higher IQ. We estimate that LGD mutation in about 400 genes can contribute to the joint class of affected females and males of lower IQ, with an overlapping and similar number of genes vulnerable to contributory missense mutation. LGD targets in the joint class overlap with published targets for intellectual disability and schizophrenia, and are enriched for chromatin modifiers, FMRP-associated genes and embryonically expressed genes. Most of the significance for the latter comes from affected females.
Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Predisposição Genética para Doença/genética , Mutação/genética , Fases de Leitura Aberta/genética , Criança , Análise por Conglomerados , Exoma/genética , Feminino , Genes , Humanos , Testes de Inteligência , Masculino , Reprodutibilidade dos TestesRESUMO
We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone.
Assuntos
DNA/química , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sulfitos/química , Moldes Genéticos , DNA/genética , Genômica/métodos , Mutação , Reprodutibilidade dos TestesRESUMO
We develop a method of analysis [affected to discordant sibling pairs (A2DS)] that tests if shared variants contribute to a disorder. Using a standard measure of genetic relation, test individuals are compared with a cohort of discordant sibling pairs (CDS) to derive a comparative similarity score. We ask if a test individual is more similar to an unrelated affected than to the unrelated unaffected sibling from the CDS and then, sum over such individuals and pairs. Statistical significance is judged by randomly permuting the affected status in the CDS. In the analysis of published genotype data from the Simons Simplex Collection (SSC) and the Autism Genetic Resource Exchange (AGRE) cohorts of children with autism spectrum disorder (ASD), we find strong statistical significance that the affected are more similar to the affected than to the unaffected of the CDS (P value â¼ 0.00001). Fathers in multiplex families have marginally greater similarity (P value = 0.02) to unrelated affected individuals. These results do not depend on ethnic matching or gender.
Assuntos
Transtorno Autístico/genética , Transtorno Autístico/fisiopatologia , Irmãos , Transtorno do Espectro Autista/genética , Transtorno do Espectro Autista/fisiopatologia , Criança , Pré-Escolar , Estudos de Coortes , Simulação por Computador , Saúde da Família , Feminino , Genótipo , Humanos , Masculino , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Fatores SexuaisRESUMO
Copy number variants (CNVs) underlie a significant amount of genetic diversity and disease. CNVs can be detected by a number of means, including chromosomal microarray analysis (CMA) and whole-genome sequencing (WGS), but these approaches suffer from either limited resolution (CMA) or are highly expensive for routine screening (both CMA and WGS). As an alternative, we have developed a next-generation sequencing-based method for CNV analysis termed SMASH, for short multiply aggregated sequence homologies. SMASH utilizes random fragmentation of input genomic DNA to create chimeric sequence reads, from which multiple mappable tags can be parsed using maximal almost-unique matches (MAMs). The SMASH tags are then binned and segmented, generating a profile of genomic copy number at the desired resolution. Because fewer reads are necessary relative to WGS to give accurate CNV data, SMASH libraries can be highly multiplexed, allowing large numbers of individuals to be analyzed at low cost. Increased genomic resolution can be achieved by sequencing to higher depth.
Assuntos
Dosagem de Genes , Análise de Sequência de DNA , Linhagem Celular Tumoral , Biologia Computacional , Variações do Número de Cópias de DNA , Feminino , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , SoftwareRESUMO
Genome-wide analysis at the level of single cells has recently emerged as a powerful tool to dissect genome heterogeneity in cancer, neurobiology, and development. To be truly transformative, single-cell approaches must affordably accommodate large numbers of single cells. This is feasible in the case of copy number variation (CNV), because CNV determination requires only sparse sequence coverage. We have used a combination of bioinformatic and molecular approaches to optimize single-cell DNA amplification and library preparation for highly multiplexed sequencing, yielding a method that can produce genome-wide CNV profiles of up to a hundred individual cells on a single lane of an Illumina HiSeq instrument. We apply the method to human cancer cell lines and biopsied cancer tissue, thereby illustrating its efficiency, reproducibility, and power to reveal underlying genetic heterogeneity and clonal phylogeny. The capacity of the method to facilitate the rapid profiling of hundreds to thousands of single-cell genomes represents a key step in making single-cell profiling an easily accessible tool for studying cell lineage.
Assuntos
Variações do Número de Cópias de DNA , DNA de Neoplasias/genética , Reação em Cadeia da Polimerase Multiplex/métodos , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Algoritmos , Sequência de Bases , Linhagem Celular Tumoral , Genoma Humano , Humanos , Dados de Sequência MolecularRESUMO
We present Ginkgo (http://qb.cshl.edu/ginkgo), a user-friendly, open-source web platform for the analysis of single-cell copy-number variations (CNVs). Ginkgo automatically constructs copy-number profiles of cells from mapped reads and constructs phylogenetic trees of related cells. We validated Ginkgo by reproducing the results of five major studies. After comparing three commonly used single-cell amplification techniques, we concluded that degenerate oligonucleotide-primed PCR is the most consistent for CNV analysis.
Assuntos
Biologia Computacional , Variações do Número de Cópias de DNA , Genoma Humano , Oligonucleotídeos/genética , Algoritmos , Animais , Automação , Análise por Conglomerados , Drosophila , Feminino , Dosagem de Genes , Genoma , Humanos , Internet , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Masculino , Camundongos , Pan troglodytes , Filogenia , Reação em Cadeia da Polimerase , Ratos , Reprodutibilidade dos Testes , Cromossomos Sexuais , Carcinoma de Pequenas Células do Pulmão/diagnóstico , Carcinoma de Pequenas Células do Pulmão/genética , SoftwareRESUMO
We previously computed that genes with de novo (DN) likely gene-disruptive (LGD) mutations in children with autism spectrum disorders (ASD) have high vulnerability: disruptive mutations in many of these genes, the vulnerable autism genes, will have a high likelihood of resulting in ASD. Because individuals with ASD have lower fecundity, such mutations in autism genes would be under strong negative selection pressure. An immediate prediction is that these genes will have a lower LGD load than typical genes in the human gene pool. We confirm this hypothesis in an explicit test by measuring the load of disruptive mutations in whole-exome sequence databases from two cohorts. We use information about mutational load to show that lower and higher intelligence quotients (IQ) affected individuals can be distinguished by the mutational load in their respective gene targets, as well as to help prioritize gene targets by their likelihood of being autism genes. Moreover, we demonstrate that transmission of rare disruptions in genes with a lower LGD load occurs more often to affected offspring; we show transmission originates most often from the mother, and transmission of such variants is seen more often in offspring with lower IQ. A surprising proportion of transmission of these rare events comes from genes expressed in the embryonic brain that show sharply reduced expression shortly after birth.
Assuntos
Transtorno Autístico/genética , Bases de Dados Genéticas , Exoma , Pool Gênico , Modelos Genéticos , Mutação , Criança , Pré-Escolar , Feminino , Humanos , MasculinoRESUMO
We present an open-source algorithm, Scalpel (http://scalpel.sourceforge.net/), which combines mapping and assembly for sensitive and specific discovery of insertions and deletions (indels) in exome-capture data. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for indel discovery, particularly in regions containing near-perfect repeats. We analyzed 593 families from the Simons Simplex Collection and demonstrated Scalpel's power to detect long (≥30 bp) transmitted events and enrichment for de novo likely gene-disrupting indels in autistic children.
Assuntos
Análise Mutacional de DNA/métodos , Exoma , Mutação INDEL , Algoritmos , Biologia Computacional/métodos , DNA/química , Bases de Dados Genéticas , Humanos , Mutação , Linguagens de Programação , Alinhamento de Sequência , SoftwareRESUMO
Genomic analysis provides insights into the role of copy number variation in disease, but most methods are not designed to resolve mixed populations of cells. In tumours, where genetic heterogeneity is common, very important information may be lost that would be useful for reconstructing evolutionary history. Here we show that with flow-sorted nuclei, whole genome amplification and next generation sequencing we can accurately quantify genomic copy number within an individual nucleus. We apply single-nucleus sequencing to investigate tumour population structure and evolution in two human breast cancer cases. Analysis of 100 single cells from a polygenomic tumour revealed three distinct clonal subpopulations that probably represent sequential clonal expansions. Additional analysis of 100 single cells from a monogenomic primary tumour and its liver metastasis indicated that a single clonal expansion formed the primary tumour and seeded the metastasis. In both primary tumours, we also identified an unexpectedly abundant subpopulation of genetically diverse 'pseudodiploid' cells that do not travel to the metastatic site. In contrast to gradual models of tumour progression, our data indicate that tumours grow by punctuated clonal expansions with few persistent intermediates.
Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Evolução Molecular , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Neoplasias da Mama/diagnóstico , Carcinoma Ductal de Mama/diagnóstico , Carcinoma Ductal de Mama/genética , Carcinoma Ductal de Mama/patologia , Pontos de Quebra do Cromossomo , Células Clonais/citologia , Diploide , Progressão da Doença , Feminino , Citometria de Fluxo , Heterogeneidade Genética , Genoma Humano/genética , Genômica , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/secundário , Perda de HeterozigosidadeRESUMO
Presently, inferring the long-range structure of the DNA templates is limited by short read lengths. Accurate template counts suffer from distortions occurring during PCR amplification. We explore the utility of introducing random mutations in identical or nearly identical templates to create distinguishable patterns that are inherited during subsequent copying. We simulate the applications of this process under assumptions of error-free sequencing and perfect mapping, using cytosine deamination as a model for mutation. The simulations demonstrate that within readily achievable conditions of nucleotide conversion and sequence coverage, we can accurately count the number of otherwise identical molecules as well as connect variants separated by long spans of identical sequence. We discuss many potential applications, such as transcript profiling, isoform assembly, haplotype phasing, and de novo genome assembly.
Assuntos
Mutagênese/genética , Análise de Sequência de DNA , Moldes Genéticos , Algoritmos , Sequência de Bases , Dados de Sequência MolecularRESUMO
Finding regions of the genome that are significantly recurrent in noisy data are a common but difficult problem in present day computational biology. Cores of recurrent events (CORE) is a computational approach to solving this problem that is based on a formalized notion by which "core" intervals explain the observed data, where the number of cores is the "depth" of the explanation. Given that formalization, we implement CORE as a combinatorial optimization procedure with depth chosen from considerations of statistical significance. An important feature of CORE is its ability to explain data with cores of widely varying lengths. We examine the performance of this system with synthetic data, and then provide two demonstrations of its utility with actual data. Applying CORE to a collection of DNA copy number profiles from single cells of a given tumor, we determine tumor population phylogeny and find the features that separate subpopulations. Applying CORE to comparative genomic hybridization data from a large set of tumor samples, we define regions of recurrent copy number aberration in breast cancer.
Assuntos
Neoplasias da Mama/genética , Regulação Neoplásica da Expressão Gênica , Genômica/métodos , Modelos Genéticos , Neoplasias da Mama/secundário , Hibridização Genômica Comparativa/métodos , Biologia Computacional/métodos , Variações do Número de Cópias de DNA/genética , Bases de Dados Genéticas , Feminino , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Filogenia , Software , TranscriptomaRESUMO
In situ detection of genomic alterations in cancer provides information at the single cell level, making it possible to investigate genomic changes in cells in a tissue context. Such topological information is important when studying intratumor heterogeneity as well as alterations related to different steps in tumor progression. We developed a quantitative multigene fluorescence in situ hybridization (QM FISH) method to detect multiple genomic regions in single cells in complex tissues. As a "proof of principle" we applied the method to breast cancer samples to identify partners in whole arm (WA) translocations. WA gain of chromosome arm 1q and loss of chromosome arm 16q are among the most frequent genomic events in breast cancer. By designing five specific FISH probes based on breakpoint information from comparative genomic hybridization array (aCGH) profiles, we visualized chromosomal translocations in clinical samples at the single cell level. By analyzing aCGH data from 295 patients with breast carcinoma with known molecular subtype, we found concurrent WA gain of 1q and loss of 16q to be more frequent in luminal A tumors compared to other molecular subtypes. QM FISH applied to a subset of samples (n = 26) identified a derivative chromosome der(1;16)(q10;p10), a result of a centromere-close translocation between chromosome arms 1q and 16p. In addition, we observed that the distribution of cells with the translocation varied from sample to sample, some had a homogenous cell population while others displayed intratumor heterogeneity with cell-to-cell variation. Finally, for one tumor with both preinvasive and invasive components, the fraction of cells with translocation was lower and more heterogeneous in the preinvasive tumor cells compared to the cells in the invasive component.