Pesquisa | BVS IEC

Genoviz Software Development Kit: Java tool kit for building genomics visualization applications.

Helt, Gregg A; Nicol, John W; Erwin, Ed; Blossom, Eric; Blanchard, Steven G; Chervitz, Stephen A; Harmon, Cyrus; Loraine, Ann E.

BMC Bioinformatics ; 10: 266, 2009 Aug 25.

Artigo em Inglês | MEDLINE | ID: mdl-19706180

RESUMO

BACKGROUND: Visualization software can expose previously undiscovered patterns in genomic data and advance biological science. RESULTS: The Genoviz Software Development Kit (SDK) is an open source, Java-based framework designed for rapid assembly of visualization software applications for genomics. The Genoviz SDK framework provides a mechanism for incorporating adaptive, dynamic zooming into applications, a desirable feature of genome viewers. Visualization capabilities of the Genoviz SDK include automated layout of features along genetic or genomic axes; support for user interactions with graphical elements (Glyphs) in a map; a variety of Glyph sub-classes that promote experimentation with new ways of representing data in graphical formats; and support for adaptive, semantic zooming, whereby objects change their appearance depending on zoom level and zooming rate adapts to the current scale. Freely available demonstration and production quality applications, including the Integrated Genome Browser, illustrate Genoviz SDK capabilities. CONCLUSION: Separation between graphics components and genomic data models makes it easy for developers to add visualization capability to pre-existing applications or build new applications using third-party data models. Source code, documentation, sample applications, and tutorials are available at http://genoviz.sourceforge.net/.

Assuntos

Genômica/métodos , Interpretação de Imagem Assistida por Computador/métodos , Linguagens de Programação , Software , Gráficos por Computador , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Interface Usuário-Computador

Use of site-directed cysteine and disulfide chemistry to probe protein structure and dynamics: applications to soluble and transmembrane receptors of bacterial chemotaxis.

Bass, Randal B; Butler, Scott L; Chervitz, Stephen A; Gloor, Susan L; Falke, Joseph J.

Methods Enzymol ; 423: 25-51, 2007.

Artigo em Inglês | MEDLINE | ID: mdl-17609126

RESUMO

Site-directed cysteine and disulfide chemistry is broadly useful in the analysis of protein structure and dynamics, and applications of this chemistry to the bacterial chemotaxis pathway have illustrated the kinds of information that can be generated. Notably, in many cases, cysteine and disulfide chemistry can be carried out in the native environment of the protein whether it be aqueous solution, a lipid bilayer, or a multiprotein complex. Moreover, the approach can tackle three types of problems crucial to a molecular understanding of a given protein: (1) it can map out 2 degrees structure, 3 degrees structure, and 4 degrees structure; (2) it can analyze conformational changes and the structural basis of regulation by covalently trapping specific conformational or signaling states; and (3) it can uncover the spatial and temporal aspects of thermal fluctuations by detecting backbone and domain dynamics. The approach can provide structural information for many proteins inaccessible to high-resolution methods. Even when a high-resolution structure is available, the approach provides complementary information about regulatory mechanisms and thermal dynamics in the native environment. Finally, the approach can be applied to an entire protein, or to a specific domain or subdomain within the full-length protein, thereby facilitating a divide-and-conquer strategy in large systems or multiprotein complexes. Rigorous application of the approach to a given protein, domain, or subdomain requires careful experimental design that adequately resolves the structural and dynamical information provided by the method. A full structural and dynamical analysis begins by scanning engineered cysteines throughout the region of interest. To determine 2 degrees structure, the solvent exposure of each cysteine is determined by measuring its chemical reactivity, and the periodicity of exposure is analyzed. To probe 3 degrees structure, 4 degrees structure, and conformational regulation, pairs of cysteines are identified that rapidly form disulfide bonds and that retain function when induced to form a disulfide bond in the folded protein or complex. Finally, to map out thermal fluctuations in a protein of known structure, disulfide formation rates are measured between distal pairs of nonperturbing surface cysteines. This chapter details these methods and illustrates applications to two proteins from the bacterial chemotaxis pathway: the periplasmic galactose binding protein and the transmembrane aspartate receptor.

Assuntos

Bioquímica/métodos , Cisteína/química , Mutagênese Sítio-Dirigida/métodos , Proteínas de Bactérias/química , Quimiotaxia , Dissulfetos/química , Escherichia coli/metabolismo , Mutação , Conformação Proteica , Engenharia de Proteínas , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Salmonella typhimurium/metabolismo

A variant by any name: quantifying annotation discordance across tools and clinical databases.

Yen, Jennifer L; Garcia, Sarah; Montana, Aldrin; Harris, Jason; Chervitz, Stephen; Morra, Massimo; West, John; Chen, Richard; Church, Deanna M.

Genome Med ; 9(1): 7, 2017 01 26.

Artigo em Inglês | MEDLINE | ID: mdl-28122645

RESUMO

BACKGROUND: Clinical genomic testing is dependent on the robust identification and reporting of variant-level information in relation to disease. With the shift to high-throughput sequencing, a major challenge for clinical diagnostics is the cross-identification of variants called on their genomic position to resources that rely on transcript- or protein-based descriptions. METHODS: We evaluated the accuracy of three tools (SnpEff, Variant Effect Predictor, and Variation Reporter) that generate transcript and protein-based variant nomenclature from genomic coordinates according to guidelines by the Human Genome Variation Society (HGVS). Our evaluation was based on transcript-controlled comparisons to a manually curated set of 126 test variants of various types drawn from data sources, each with HGVS-compliant transcript and protein descriptors. We further evaluated the concordance between annotations generated by Snpeff and Variant Effect Predictor and those in major germline and cancer databases: ClinVar and COSMIC, respectively. RESULTS: We find that there is substantial discordance between the annotation tools and databases in the description of insertions and/or deletions. Using our ground truth set of variants, constructed specifically to identify challenging events, accuracy was between 80 and 90% for coding and 50 and 70% for protein changes for 114 to 126 variants. Exact concordance for SNV syntax was over 99.5% between ClinVar and Variant Effect Predictor and SnpEff, but less than 90% for non-SNV variants. For COSMIC, exact concordance for coding and protein SNVs was between 65 and 88% and less than 15% for insertions. Across the tools and datasets, there was a wide range of different but equivalent expressions describing protein variants. CONCLUSIONS: Our results reveal significant inconsistency in variant representation across tools and databases. While some of these syntax differences may be clear to a clinician, they can confound variant matching, an important step in variant classification. These results highlight the urgent need for the adoption and adherence to uniform standards in variant annotation, with consistent reporting on the genomic reference, to enable accurate and efficient data-driven clinical care.

Assuntos

Confiabilidade dos Dados , Variação Genética , Genoma Humano , Anotação de Sequência Molecular/normas , Software/normas , Biologia Computacional/normas , Bases de Dados Genéticas , Humanos , Mutação INDEL

Achieving high-sensitivity for clinical applications using augmented exome sequencing.

Patwardhan, Anil; Harris, Jason; Leng, Nan; Bartha, Gabor; Church, Deanna M; Luo, Shujun; Haudenschild, Christian; Pratt, Mark; Zook, Justin; Salit, Marc; Tirch, Jeanie; Morra, Massimo; Chervitz, Stephen; Li, Ming; Clark, Michael; Garcia, Sarah; Chandratillake, Gemma; Kirk, Scott; Ashley, Euan; Snyder, Michael; Altman, Russ; Bustamante, Carlos; Butte, Atul J; West, John; Chen, Richard.

Genome Med ; 7: 71, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26269718

RESUMO

BACKGROUND: Whole exome sequencing is increasingly used for the clinical evaluation of genetic disease, yet the variation of coverage and sensitivity over medically relevant parts of the genome remains poorly understood. Several sequencing-based assays continue to provide coverage that is inadequate for clinical assessment. METHODS: Using sequence data obtained from the NA12878 reference sample and pre-defined lists of medically-relevant protein-coding and noncoding sequences, we compared the breadth and depth of coverage obtained among four commercial exome capture platforms and whole genome sequencing. In addition, we evaluated the performance of an augmented exome strategy, ACE, that extends coverage in medically relevant regions and enhances coverage in areas that are challenging to sequence. Leveraging reference call-sets, we also examined the effects of improved coverage on variant detection sensitivity. RESULTS: We observed coverage shortfalls with each of the conventional exome-capture and whole-genome platforms across several medically interpretable genes. These gaps included areas of the genome required for reporting recently established secondary findings (ACMG) and known disease-associated loci. The augmented exome strategy recovered many of these gaps, resulting in improved coverage in these areas. At clinically-relevant coverage levels (100 % bases covered at ≥20×), ACE improved coverage among genes in the medically interpretable genome (>90 % covered relative to 10-78 % with other platforms), the set of ACMG secondary finding genes (91 % covered relative to 4-75 % with other platforms) and a subset of variants known to be associated with human disease (99 % covered relative to 52-95 % with other platforms). Improved coverage translated into improvements in sensitivity, with ACE variant detection sensitivities (>97.5 % SNVs, >92.5 % InDels) exceeding that observed with conventional whole-exome and whole-genome platforms. CONCLUSIONS: Clinicians should consider analytical performance when making clinical assessments, given that even a few missed variants can lead to reporting false negative results. An augmented exome strategy provides a level of coverage not achievable with other platforms, thus addressing concerns regarding the lack of sensitivity in clinically important regions. In clinical applications where comprehensive coverage of medically interpretable areas of the genome requires higher localized sequencing depth, an augmented exome approach offers both cost and performance advantages over other sequencing-based tests.

Assuntos

Exoma , Análise de Sequência de DNA/métodos , Genoma Humano , Humanos

Variant priorization and analysis incorporating problematic regions of the genome.

Patwardhan, Anil; Clark, Michael; Morgan, Alex; Chervitz, Stephen; Pratt, Mark; Bartha, Gabor; Chandratillake, Gemma; Garcia, Sarah; Leng, Nan; Chen, Richard.

Pac Symp Biocomput ; : 277-87, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24297554

RESUMO

In case-control studies of rare Mendelian disorders and complex diseases, the power to detect variant and gene-level associations of a given effect size is limited by the size of the study sample. Paradoxically, low statistical power may increase the likelihood that a statistically significant finding is also a false positive. The prioritization of variants based on call quality, putative effects on protein function, the predicted degree of deleteriousness, and allele frequency is often used as a mechanism for reducing the occurrence of false positives, while preserving the set of variants most likely to contain true disease associations. We propose that specificity can be further improved by considering errors that are specific to the regions of the genome being sequenced. These problematic regions (PRs) are identified a-priori and are used to down-weight constitutive variants in a case-control analysis. Using samples drawn from 1000-Genomes, we illustrate the utility of PRs in identifying true variant and gene associations using a case-control study on a known Mendelian disease, cystic fibrosis (CF).

Assuntos

Variação Genética , Genoma Humano , Estudos de Casos e Controles , Biologia Computacional , Fibrose Cística/genética , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Bases de Dados Genéticas/estatística & dados numéricos , Exoma , Estudos de Associação Genética/estatística & dados numéricos , Biblioteca Genômica , Projeto Genoma Humano , Humanos , Medicina de Precisão/estatística & dados numéricos , Tamanho da Amostra , Alinhamento de Sequência/estatística & dados numéricos

Data standards for Omics data: the basis of data sharing and reuse.

Chervitz, Stephen A; Deutsch, Eric W; Field, Dawn; Parkinson, Helen; Quackenbush, John; Rocca-Serra, Phillipe; Sansone, Susanna-Assunta; Stoeckert, Christian J; Taylor, Chris F; Taylor, Ronald; Ball, Catherine A.

Methods Mol Biol ; 719: 31-69, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-21370078

RESUMO

To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.

Assuntos

Biologia Computacional/normas , Disseminação de Informação/métodos , Biologia Computacional/métodos , Atenção à Saúde/normas , Humanos , Padrões de Referência , Projetos de Pesquisa/normas

The Bioperl toolkit: Perl modules for the life sciences.

Stajich, Jason E; Block, David; Boulez, Kris; Brenner, Steven E; Chervitz, Stephen A; Dagdigian, Chris; Fuellen, Georg; Gilbert, James G R; Korf, Ian; Lapp, Hilmar; Lehväslaiho, Heikki; Matsalla, Chad; Mungall, Chris J; Osborne, Brian I; Pocock, Matthew R; Schattner, Peter; Senger, Martin; Stein, Lincoln D; Stupka, Elia; Wilkinson, Mark D; Birney, Ewan.

Genome Res ; 12(10): 1611-8, 2002 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-12368254

RESUMO

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort.

Assuntos

Disciplinas das Ciências Biológicas/métodos , Biologia Computacional/métodos , Algoritmos , Animais , Disciplinas das Ciências Biológicas/tendências , Biologia Computacional/tendências , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Humanos , Internet , Sistemas On-Line , Software , Design de Software , Integração de Sistemas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA