Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Nat Commun ; 11(1): 3400, 2020 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-32636365

RESUMO

The Pan-Cancer Analysis of Whole Genomes (PCAWG) project generated a vast amount of whole-genome cancer sequencing resource data. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumor types, we provide a user's guide to the five publicly available online data exploration and visualization tools introduced in the PCAWG marker paper. These tools are ICGC Data Portal, UCSC Xena, Chromothripsis Explorer, Expression Atlas, and PCAWG-Scout. We detail use cases and analyses for each tool, show how they incorporate outside resources from the larger genomics ecosystem, and demonstrate how the tools can be used together to understand the biology of cancers more deeply. Together, the tools enable researchers to query the complex genomic PCAWG data dynamically and integrate external information, enabling and enhancing interpretation.


Assuntos
Biologia Computacional/métodos , Genoma Humano , Neoplasias/genética , Cromotripsia , Análise de Dados , Bases de Dados Genéticas , Genômica , Humanos , Internet , Mutação , Software , Interface Usuário-Computador , Sequenciamento Completo do Genoma
2.
PLoS Genet ; 14(5): e1007392, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29768410

RESUMO

[This corrects the article DOI: 10.1371/journal.pgen.1000832.].

3.
Nat Genet ; 43(4): 365-9, 2011 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-21358634

RESUMO

Multiple self-healing squamous epithelioma (MSSE), also known as Ferguson-Smith disease (FSD), is an autosomal-dominant skin cancer condition characterized by multiple squamous-carcinoma-like locally invasive skin tumors that grow rapidly for a few weeks before spontaneously regressing, leaving scars. High-throughput genomic sequencing of a conservative estimate (24.2 Mb) of the disease locus on chromosome 9 using exon array capture identified independent mutations in TGFBR1 in three unrelated families. Subsequent dideoxy sequencing of TGFBR1 identified 11 distinct monoallelic mutations in 18 affected families, firmly establishing TGFBR1 as the causative gene. The nature of the sequence variants, which include mutations in the extracellular ligand-binding domain and a series of truncating mutations in the kinase domain, indicates a clear genotype-phenotype correlation between loss-of-function TGFBR1 mutations and MSSE. This distinguishes MSSE from the Marfan syndrome-related disorders in which missense mutations in TGFBR1 lead to developmental defects with vascular involvement but no reported predisposition to cancer.


Assuntos
Mutação , Proteínas Serina-Treonina Quinases/genética , Receptores de Fatores de Crescimento Transformadores beta/genética , Neoplasias Cutâneas/genética , Sequência de Aminoácidos , Sequência de Bases , Carcinoma/genética , Carcinoma/metabolismo , Códon sem Sentido , Sequência Conservada , Primers do DNA/genética , Feminino , Mutação da Fase de Leitura , Estudos de Associação Genética , Haplótipos , Humanos , Ceratoacantoma/genética , Ceratoacantoma/metabolismo , Masculino , Síndrome de Marfan/genética , Modelos Moleculares , Dados de Sequência Molecular , Proteínas Mutantes/química , Proteínas Mutantes/genética , Proteínas Mutantes/metabolismo , Mutação de Sentido Incorreto , Proteínas Serina-Treonina Quinases/química , Proteínas Serina-Treonina Quinases/metabolismo , Estrutura Terciária de Proteína , Receptor do Fator de Crescimento Transformador beta Tipo I , Receptores de Fatores de Crescimento Transformadores beta/química , Receptores de Fatores de Crescimento Transformadores beta/metabolismo , Homologia de Sequência de Aminoácidos , Neoplasias Cutâneas/metabolismo
4.
PLoS Genet ; 6(1): e1000832, 2010 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-20126413

RESUMO

U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30x genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date.


Assuntos
Linhagem Celular Tumoral/química , Genoma Humano , Glioma/genética , Linhagem Celular Tumoral/citologia , Genótipo , Humanos , Dados de Sequência Molecular , Mutação , Polimorfismo de Nucleotídeo Único , Proteínas/genética , Análise de Sequência de DNA
5.
BMC Bioinformatics ; 11 Suppl 12: S2, 2010 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-21210981

RESUMO

BACKGROUND: Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. RESULTS: In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). CONCLUSIONS: The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.


Assuntos
Genômica/métodos , Software , Bases de Dados de Ácidos Nucleicos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA/métodos
6.
BMC Genomics ; 10: 646, 2009 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-20043857

RESUMO

BACKGROUND: The emergence of next-generation sequencing technology presents tremendous opportunities to accelerate the discovery of rare variants or mutations that underlie human genetic disorders. Although the complete sequencing of the affected individuals' genomes would be the most powerful approach to finding such variants, the cost of such efforts make it impractical for routine use in disease gene research. In cases where candidate genes or loci can be defined by linkage, association, or phenotypic studies, the practical sequencing target can be made much smaller than the whole genome, and it becomes critical to have capture methods that can be used to purify the desired portion of the genome for shotgun short-read sequencing without biasing allelic representation or coverage. One major approach is array-based capture which relies on the ability to create a custom in-situ synthesized oligonucleotide microarray for use as a collection of hybridization capture probes. This approach is being used by our group and others routinely and we are continuing to improve its performance. RESULTS: Here, we provide a complete protocol optimized for large aggregate sequence intervals and demonstrate its utility with the capture of all predicted amino acid coding sequence from 3,038 human genes using 241,700 60-mer oligonucleotides. Further, we demonstrate two techniques by which the efficiency of the capture can be increased: by introducing a step to block cross hybridization mediated by common adapter sequences used in sequencing library construction, and by repeating the hybridization capture step. These improvements can boost the targeting efficiency to the point where over 85% of the mapped sequence reads fall within 100 bases of the targeted regions. CONCLUSIONS: The complete protocol introduced in this paper enables researchers to perform practical capture experiments, and includes two novel methods for increasing the targeting efficiency. Coupled with the new massively parallel sequencing technologies, this provides a powerful approach to identifying disease-causing genetic variants that can be localized within the genome by traditional methods.


Assuntos
Loci Gênicos , Genoma Humano , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de DNA/métodos , DNA de Neoplasias/genética , Genes Neoplásicos , Biblioteca Genômica , Humanos , Alinhamento de Sequência
7.
FEBS J ; 272(20): 5110-8, 2005 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16218945

RESUMO

The wealth of available genomic data has spawned a corresponding interest in computational methods that can impart biological meaning and context to these experiments. Traditional computational methods have drawn relationships between pairs of proteins or genes based on notions of equality or similarity between their patterns of occurrence or behavior. For example, two genes displaying similar variation in expression, over a number of experiments, may be predicted to be functionally related. We have introduced a natural extension of these approaches, instead identifying logical relationships involving triplets of proteins. Triplets provide for various discrete kinds of logic relationships, leading to detailed inferences about biological associations. For instance, a protein C might be encoded within an organism if, and only if, two other proteins A and B are also both encoded within the organism, thus suggesting that gene C is functionally related to genes A and B. The method has been applied fruitfully to both phylogenetic and microarray expression data, and has been used to associate logical combinations of protein activity with disease state phenotypes, revealing previously unknown ternary relationships among proteins, and illustrating the inherent complexities that arise in biological data.


Assuntos
Fenômenos Fisiológicos Celulares , Biologia Computacional/métodos , Bases de Dados Genéticas , Algoritmos , Animais , Perfilação da Expressão Gênica , Glioma/genética , Humanos , Modelos Biológicos , Análise de Sequência com Séries de Oligonucleotídeos , Filogenia , Proteínas/genética , Proteínas/fisiologia
8.
Nucleic Acids Res ; 32(Web Server issue): W360-4, 2004 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-15215411

RESUMO

The Genomic Disulfide Analysis Program (GDAP) provides web access to computationally predicted protein disulfide bonds for over one hundred microbial genomes, including both bacterial and achaeal species. In the GDAP process, sequences of unknown structure are mapped, when possible, to known homologous Protein Data Bank (PDB) structures, after which specific distance criteria are applied to predict disulfide bonds. GDAP also accepts user-supplied protein sequences and subsequently queries the PDB sequence database for the best matches, scans for possible disulfide bonds and returns the results to the client. These predictions are useful for a variety of applications and have previously been used to show a dramatic preference in certain thermophilic archaea and bacteria for disulfide bonds within intracellular proteins. Given the central role these stabilizing, covalent bonds play in such organisms, the predictions available from GDAP provide a rich data source for designing site-directed mutants with more stable thermal profiles. The GDAP web application is a gateway to this information and can be used to understand the role disulfide bonds play in protein stability both in these unusual organisms and in sequences of interest to the individual researcher. The prediction server can be accessed at http://www.doe-mbi.ucla.edu/Services/GDAP.


Assuntos
Proteínas Arqueais/química , Proteínas de Bactérias/química , Cisteína/análise , Dissulfetos/análise , Software , Interpretação Estatística de Dados , Genoma Arqueal , Genoma Bacteriano , Internet , Análise de Sequência de Proteína
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA