Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
Genome Res ; 29(3): 472-484, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30737237

RESUMO

K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.


Assuntos
Genoma Humano , Humanos , Células K562 , Cariótipo , Polimorfismo Genético , Sequenciamento Completo do Genoma
2.
PLoS Comput Biol ; 16(6): e1007933, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-32559231

RESUMO

A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Heurística , Humanos , Mutação INDEL
3.
Nucleic Acids Res ; 47(8): 3846-3861, 2019 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-30864654

RESUMO

HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.


Assuntos
Mapeamento Cromossômico/métodos , Genoma Humano , Genômica/métodos , Haplótipos , Análise de Sequência de DNA/estatística & dados numéricos , Alelos , Aneuploidia , Metilação de DNA , Variação Estrutural do Genoma , Células Hep G2 , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Cariotipagem , Perda de Heterozigosidade , Polimorfismo de Nucleotídeo Único , Retroelementos
4.
Nat Methods ; 14(9): 915-920, 2017 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28714986

RESUMO

In read cloud approaches, microfluidic partitioning of long genomic DNA fragments and barcoding of shorter fragments derived from these fragments retains long-range information in short sequencing reads. This combination of short reads with long-range information represents a powerful alternative to single-molecule long-read sequencing. We develop Genome-wide Reconstruction of Complex Structural Variants (GROC-SVs) for SV detection and assembly from read cloud data and apply this method to Illumina-sequenced 10x Genomics sarcoma and breast cancer data sets. Compared with short-fragment sequencing, GROC-SVs substantially improves the specificity of breakpoint detection at comparable sensitivity. This approach also performs sequence assembly across multiple breakpoints simultaneously, enabling the reconstruction of events exhibiting remarkable complexity. We show that chromothriptic rearrangements occurred before copy number amplifications, and that rates of single-nucleotide variants and SVs are not correlated. Our results support the use of read cloud approaches to advance the characterization of large and complex structural variation.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Análise Mutacional de DNA/métodos , Variação Genética/genética , Genoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
5.
Trends Genet ; 31(4): 208-14, 2015 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-25733351

RESUMO

Evolutionary mechanisms in cancer progression give tumors their individuality. Cancer evolution is different from organismal evolution, however, and we discuss where concepts from evolutionary genetics are useful or limited in facilitating an understanding of cancer. Based on these concepts we construct and apply the simplest plausible model of tumor growth and progression. Simulations using this simple model illustrate the importance of stochastic events early in tumorigenesis, highlight the dominance of exponential growth over linear growth and differentiation, and explain the clonal substructure of tumors.


Assuntos
Neoplasias/genética , Neoplasias/patologia , Animais , Diferenciação Celular , Progressão da Doença , Heterogeneidade Genética , Humanos , Modelos Biológicos , Mutação , Neoplasias/etiologia
6.
Genes Dev ; 24(10): 992-1009, 2010 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-20413612

RESUMO

MicroRNAs (miRNAs) are small regulatory RNAs that derive from distinctive hairpin transcripts. To learn more about the miRNAs of mammals, we sequenced 60 million small RNAs from mouse brain, ovary, testes, embryonic stem cells, three embryonic stages, and whole newborns. Analysis of these sequences confirmed 398 annotated miRNA genes and identified 108 novel miRNA genes. More than 150 previously annotated miRNAs and hundreds of candidates failed to yield sequenced RNAs with miRNA-like features. Ectopically expressing these previously proposed miRNA hairpins also did not yield small RNAs, whereas ectopically expressing the confirmed and newly identified hairpins usually did yield small RNAs with the classical miRNA features, including dependence on the Drosha endonuclease for processing. These experiments, which suggest that previous estimates of conserved mammalian miRNAs were inflated, provide a substantially revised list of confidently identified murine miRNAs from which to infer the general features of mammalian miRNAs. Our analyses also revealed new aspects of miRNA biogenesis and modification, including tissue-specific strand preferences, sequential Dicer cleavage of a metazoan precursor miRNA (pre-miRNA), consequential 5' heterogeneity, newly identified instances of miRNA editing, and evidence for widespread pre-miRNA uridylation reminiscent of miRNA regulation by Lin28.


Assuntos
Genes/genética , Genoma/genética , MicroRNAs/genética , Animais , Linhagem Celular , Perfilação da Expressão Gênica , Humanos , Sequências Repetidas Invertidas/genética , Camundongos , MicroRNAs/biossíntese , MicroRNAs/metabolismo , Ribonuclease III/metabolismo
7.
Mol Cell ; 36(2): 245-54, 2009 Oct 23.
Artigo em Inglês | MEDLINE | ID: mdl-19854133

RESUMO

Core RNA-processing reactions in eukaryotic cells occur cotranscriptionally in a chromatin context, but the relationship between chromatin structure and pre-mRNA processing is poorly understood. We observed strong nucleosome depletion around human polyadenylation sites (PAS) and nucleosome enrichment just downstream of PAS. In genes with multiple alternative PAS, higher downstream nucleosome affinity was associated with higher PAS usage, independently of known PAS motifs that function at the RNA level. Conversely, exons were associated with distinct peaks in nucleosome density. Exons flanked by long introns or weak splice sites exhibited stronger nucleosome enrichment, and incorporation of nucleosome density data improved splicing simulation accuracy. Certain histone modifications, including H3K36me3 and H3K27me2, were specifically enriched on exons, suggesting active marking of exon locations at the chromatin level. Together, these findings provide evidence for extensive functional connections between chromatin structure and RNA processing.


Assuntos
Cromatina/genética , Éxons/genética , Poliadenilação/genética , Composição de Bases/genética , Histonas/metabolismo , Humanos , Íntrons/genética , Metilação , Nucleossomos/metabolismo , Sítios de Splice de RNA/genética , Splicing de RNA/genética , Sequências Reguladoras de Ácido Nucleico/genética
8.
BMC Genomics ; 17: 64, 2016 Jan 16.
Artigo em Inglês | MEDLINE | ID: mdl-26772178

RESUMO

BACKGROUND: The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. RESULTS: We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. CONCLUSIONS: We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Software , Benchmarking , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Linhagem , Polimorfismo de Nucleotídeo Único/genética
9.
Genome Res ; 23(12): 2078-90, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24072873

RESUMO

Variation in protein output across the genome is controlled at several levels, but the relative contributions of different regulatory mechanisms remain poorly understood. Here, we obtained global measurements of decay and translation rates for mRNAs with alternative 3' untranslated regions (3' UTRs) in murine 3T3 cells. Distal tandem isoforms had slightly but significantly lower mRNA stability and greater translational efficiency than proximal isoforms on average. The diversity of alternative 3' UTRs also enabled inference and evaluation of both positively and negatively acting cis-regulatory elements. The 3' UTR elements with the greatest implied influence were microRNA complementary sites, which were associated with repression of 32% and 4% at the stability and translational levels, respectively. Nonetheless, both the decay and translation rates were highly correlated for proximal and distal 3' UTR isoforms from the same genes, implying that in 3T3 cells, alternative 3' UTR sequences play a surprisingly small regulatory role compared to other mRNA regions.


Assuntos
Regiões 3' não Traduzidas , Biossíntese de Proteínas , Isoformas de RNA , Estabilidade de RNA , Células 3T3 , Processamento Alternativo , Animais , Fibroblastos/metabolismo , Regulação da Expressão Gênica , Genoma , Modelos Lineares , Camundongos , MicroRNAs/metabolismo , Conformação Molecular , Motivos de Nucleotídeos , Fases de Leitura Aberta , Poliadenilação , RNA Mensageiro/metabolismo , Elementos Reguladores de Transcrição , Análise de Sequência de RNA
10.
Bioinformatics ; 31(24): 3994-6, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26286809

RESUMO

UNLABELLED: Visualizing read alignments is the most effective way to validate candidate structural variants (SVs) with existing data. We present svviz, a sequencing read visualizer for SVs that sorts and displays only reads relevant to a candidate SV. svviz works by searching input bam(s) for potentially relevant reads, realigning them against the inferred sequence of the putative variant allele as well as the reference allele and identifying reads that match one allele better than the other. Separate views of the two alleles are then displayed in a scrollable web browser view, enabling a more intuitive visualization of each allele, compared with the single reference genome-based view common to most current read browsers. The browser view facilitates examining the evidence for or against a putative variant, estimating zygosity, visualizing affected genomic annotations and manual refinement of breakpoints. svviz supports data from most modern sequencing platforms. AVAILABILITY AND IMPLEMENTATION: svviz is implemented in python and freely available from http://svviz.github.io/.


Assuntos
Variação Estrutural do Genoma , Genômica/métodos , Software , Alelos , Alinhamento de Sequência
12.
Nat Biotechnol ; 38(11): 1347-1355, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32541955

RESUMO

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.


Assuntos
Mutação em Linhagem Germinativa/genética , Mutação INDEL/genética , Diploide , Variação Estrutural do Genoma , Humanos , Anotação de Sequência Molecular , Análise de Sequência de DNA
13.
Sci Data ; 3: 160025, 2016 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-27271295

RESUMO

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.


Assuntos
Benchmarking , Genoma Humano , Exoma , Genômica , Humanos , Mutação INDEL
14.
Elife ; 4: e05538, 2015 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-25871848

RESUMO

The effects of genetic variation on gene regulation in the developing mammalian embryo remain largely unexplored. To globally quantify these effects, we crossed two divergent mouse strains and asked how genotype of the mother or of the embryo drives gene expression phenotype genomewide. Embryonic expression of 331 genes depends on the genotype of the mother. Embryonic genotype controls allele-specific expression of 1594 genes and a highly overlapping set of cis-expression quantitative trait loci (eQTL). A marked paucity of trans-eQTL suggests that the widespread expression differences do not propagate through the embryonic gene regulatory network. The cis-eQTL genes exhibit lower-than-average evolutionary conservation and are depleted for developmental regulators, consistent with purifying selection acting on expression phenotype of pattern formation genes. The widespread effect of maternal and embryonic genotype in conjunction with the purifying selection we uncovered suggests that embryogenesis is an important and understudied reservoir of phenotypic variation.


Assuntos
Desenvolvimento Embrionário/genética , Regulação da Expressão Gênica no Desenvolvimento , Redes Reguladoras de Genes , Padrões de Herança , Locos de Características Quantitativas , Alelos , Animais , Evolução Biológica , Cruzamentos Genéticos , Embrião de Mamíferos , Feminino , Perfilação da Expressão Gênica , Variação Genética , Genótipo , Masculino , Camundongos , Fenótipo
15.
Genome Med ; 7(1): 28, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25918554

RESUMO

BACKGROUND: All cells in an individual are related to one another by a bifurcating lineage tree, in which each node is an ancestral cell that divided into two, each branch connects two nodes, and the root is the zygote. When a somatic mutation occurs in an ancestral cell, all its descendants carry the mutation, which can then serve as a lineage marker for the phylogenetic reconstruction of tumor progression. Using this concept, we investigate cell lineage relationships and genetic heterogeneity of pre-invasive neoplasias compared to invasive carcinomas. METHODS: We deeply sequenced over a thousand phylogenetically informative somatic variants in 66 morphologically independent samples from six patients that represent a spectrum of normal, early neoplasia, carcinoma in situ, and invasive carcinoma. For each patient, we obtained a highly resolved lineage tree that establishes the phylogenetic relationships among the pre-invasive lesions and with the invasive carcinoma. RESULTS: The trees reveal lineage heterogeneity of pre-invasive lesions, both within the same lesion, and between histologically similar ones. On the basis of the lineage trees, we identified a large number of independent recurrences of PIK3CA H1047 mutations in separate lesions in four of the six patients, often separate from the diagnostic carcinoma. CONCLUSIONS: Our analyses demonstrate that multi-sample phylogenetic inference provides insights on the origin of driver mutations, lineage heterogeneity of neoplastic proliferations, and the relationship of genomically aberrant neoplasias with the primary tumors. PIK3CA driver mutations may be comparatively benign inducers of cellular proliferation.

16.
Nat Struct Mol Biol ; 15(10): 1015-23, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18776903

RESUMO

In the fission yeast Schizosaccharomyces pombe, the RNA interference (RNAi) machinery is required to generate small interfering RNAs (siRNAs) that mediate heterochromatic gene silencing. Efficient silencing also requires the TRAMP complex, which contains the noncanonical Cid14 poly(A) polymerase and targets aberrant RNAs for degradation. Here we use high-throughput sequencing to analyze Argonaute-associated small RNAs (sRNAs) in both the presence and absence of Cid14. Most sRNAs in fission yeast start with a 5' uracil, and we argue these are loaded most efficiently into Argonaute. In wild-type cells most sRNAs match to repeated regions of the genome, whereas in cid14Delta cells the sRNA profile changes to include major new classes of sRNAs originating from ribosomal RNAs and a tRNA. Thus, Cid14 prevents certain abundant RNAs from becoming substrates for the RNAi machinery, thereby freeing the RNAi machinery to act on its proper targets.


Assuntos
RNA Interferente Pequeno/genética , Schizosaccharomyces/enzimologia , Schizosaccharomyces/genética , Serina Endopeptidases/metabolismo , Proteínas Argonautas , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Centrômero/genética , DNA Antissenso/genética , Deleção de Genes , Genoma Fúngico/genética , Histona-Lisina N-Metiltransferase , Metiltransferases/genética , Metiltransferases/metabolismo , Nucleotídeos/genética , Polinucleotídeo Adenililtransferase/genética , Polinucleotídeo Adenililtransferase/metabolismo , Ligação Proteica , RNA Ribossômico/genética , Proteínas de Ligação a RNA , Proteínas de Schizosaccharomyces pombe/genética , Proteínas de Schizosaccharomyces pombe/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA