Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Pac Symp Biocomput ; 28: 383-394, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36540993

RESUMEN

As the diversity of genomic variation data increases with our growing understanding of the role of variation in health and disease, it is critical to develop standards for precise inter-system exchange of these data for research and clinical applications. The Global Alliance for Genomics and Health (GA4GH) Variation Representation Specification (VRS) meets this need through a technical terminology and information model for disambiguating and concisely representing variation concepts. Here we discuss the recent Genotype model in VRS, which may be used to represent the allelic composition of a genetic locus. We demonstrate the use of the Genotype model and the constituent Haplotype model for the precise and interoperable representation of pharmacogenomic diplotypes, HGVS variants, and VCF records using VRS and discuss how this can be leveraged to enable interoperable exchange and search operations between assayed variation and genomic knowledgebases.


Asunto(s)
Biología Computacional , Variación Genética , Humanos , Bases de Datos Genéticas , Genómica , Genotipo
2.
Cell Genom ; 1(2)2021 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-35311178

RESUMEN

Maximizing the personal, public, research, and clinical value of genomic information will require the reliable exchange of genetic variation data. We report here the Variation Representation Specification (VRS, pronounced "verse"), an extensible framework for the computable representation of variation that complements contemporary human-readable and flat file standards for genomic variation representation. VRS provides semantically precise representations of variation and leverages this design to enable federated identification of biomolecular variation with globally consistent and unique computed identifiers. The VRS framework includes a terminology and information model, machine-readable schema, data sharing conventions, and a reference implementation, each of which is intended to be broadly useful and freely available for community use. VRS was developed by a partnership among national information resource providers, public initiatives, and diagnostic testing laboratories under the auspices of the Global Alliance for Genomics and Health (GA4GH).

3.
PLoS One ; 15(12): e0239883, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33270643

RESUMEN

MOTIVATION: Access to biological sequence data, such as genome, transcript, or protein sequence, is at the core of many bioinformatics analysis workflows. The National Center for Biotechnology Information (NCBI), Ensembl, and other sequence database maintainers provide methods to access sequences through network connections. For many users, the convenience and currency of remotely managed data are compelling, and the network latency is non-consequential. However, for high-throughput and clinical applications, local sequence collections are essential for performance, stability, privacy, and reproducibility. RESULTS: Here we describe SeqRepo, a novel system for building a local, high-performance, non-redundant collection of biological sequences. SeqRepo enables clients to use primary database identifiers and several digests to identify sequences and sequence alises. SeqRepo provides a native Python interface and a REST interface, which can run locally and enables access from other programming languages. SeqRepo also provides an alternative REST interface based on the GA4GH refget protocol. SeqRepo provides fast random access to sequence slices. We provide results that demonstrate that a local SeqRepo sequence collection yields significant performance benefits of up to 1300-fold over remote sequence collections. In our use case for a variant validation and normalization pipeline, SeqRepo improved throughput 50-fold relative to use with remote sequences. SeqRepo may be used with any species or sequence type. Regular snapshots of Human sequence collections are available. It is often convenient or necessary to use a computed digest as a sequence identifier. For example, a digest-based identifier may be used to refer to proprietary reference genomes or segments of a graph genome, for which conventional identifiers will not be available. Here we also introduce a convention for the application of the SHA-512 hashing algorithm with Base64 encoding to generate URL-safe identifiers. This convention, sha512t24u, combines a fast digest mechanism with a space-efficient representation that can be used for any object. Our report includes an analysis of timing and collision probabilities for sha512t24u. SeqRepo enables clients to use sha512t24u as identifiers, thereby seamlessly integrating public and private sequence sets. AVAILABILITY: SeqRepo is released under the Apache License 2.0 and is available on github and PyPi. Docker images and database snapshots are also available. See https://github.com/biocommons/biocommons.seqrepo.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Programas Informáticos , Algoritmos , Genoma Humano/genética , Genómica/métodos , Humanos , Lenguajes de Programación , Reproducibilidad de los Resultados
4.
Hum Mutat ; 39(12): 1803-1813, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30129167

RESUMEN

The Human Genome Variation Society (HGVS) nomenclature guidelines encourage the accurate and standard description of DNA, RNA, and protein sequence variants in public variant databases and the scientific literature. Inconsistent application of the HGVS guidelines can lead to misinterpretation of variants in clinical settings. Reliable software tools are essential to ensure consistent application of the HGVS guidelines when reporting and interpreting variants. We present the hgvs Python package, a comprehensive tool for manipulating sequence variants according to the HGVS nomenclature guidelines. Distinguishing features of the hgvs package include: (1) parsing, formatting, validating, and normalizing variants on genome, transcript, and protein sequences; (2) projecting variants between aligned sequences, including those with gapped alignments; (3) flexible installation using remote or local data (fully local installations eliminate network dependencies); (4) extensive automated tests; and (5) open source development by a community from eight organizations worldwide. This report summarizes recent and significant updates to the hgvs package since its original release in 2014, and presents results of extensive validation using clinical relevant variants from ClinVar and HGMD.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Variación Genética , Genoma Humano , Guías como Asunto , Humanos , Sociedades Médicas , Programas Informáticos
5.
Hum Mutat ; 39(1): 61-68, 2018 01.
Artículo en Inglés | MEDLINE | ID: mdl-28967166

RESUMEN

The Human Genome Variation Society (HGVS) variant nomenclature is widely used to describe sequence variants in scientific publications, clinical reports, and databases. However, the HGVS recommendations are complex and this often results in inaccurate variant descriptions being reported. The open-source hgvs Python package (https://github.com/biocommons/hgvs) provides a programmatic interface for parsing, manipulating, formatting, and validating of variants according to the HGVS recommendations, but does not provide a user-friendly Web interface. We have developed a Web-based variant validation tool, VariantValidator (https://variantvalidator.org/), which utilizes the hgvs Python package and provides additional functionality to assist users who wish to accurately describe and report sequence-level variations that are compliant with the HGVS recommendations. VariantValidator was designed to ensure that users are guided through the intricacies of the HGVS nomenclature, for example, if the user makes a mistake, VariantValidator automatically corrects the mistake if it can, or provides helpful guidance if it cannot. In addition, VariantValidator has the facility to interconvert genomic variant descriptions in HGVS and Variant Call Format with a degree of accuracy that surpasses most competing solutions.


Asunto(s)
Biología Computacional/métodos , Variación Genética , Programas Informáticos , Mapeo Cromosómico/métodos , Bases de Datos Genéticas , Exones , Humanos , Intrones , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN/métodos , Interfaz Usuario-Computador , Navegador Web
6.
Genome Med ; 8(1): 117, 2016 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-27814769

RESUMEN

BACKGROUND: To truly achieve personalized medicine in oncology, it is critical to catalog and curate cancer sequence variants for their clinical relevance. The Somatic Working Group (WG) of the Clinical Genome Resource (ClinGen), in cooperation with ClinVar and multiple cancer variant curation stakeholders, has developed a consensus set of minimal variant level data (MVLD). MVLD is a framework of standardized data elements to curate cancer variants for clinical utility. With implementation of MVLD standards, and in a working partnership with ClinVar, we aim to streamline the somatic variant curation efforts in the community and reduce redundancy and time burden for the interpretation of cancer variants in clinical practice. METHODS: We developed MVLD through a consensus approach by i) reviewing clinical actionability interpretations from institutions participating in the WG, ii) conducting extensive literature search of clinical somatic interpretation schemas, and iii) survey of cancer variant web portals. A forthcoming guideline on cancer variant interpretation, from the Association of Molecular Pathology (AMP), can be incorporated into MVLD. RESULTS: Along with harmonizing standardized terminology for allele interpretive and descriptive fields that are collected by many databases, the MVLD includes unique fields for cancer variants such as Biomarker Class, Therapeutic Context and Effect. In addition, MVLD includes recommendations for controlled semantics and ontologies. The Somatic WG is collaborating with ClinVar to evaluate MVLD use for somatic variant submissions. ClinVar is an open and centralized repository where sequencing laboratories can report summary-level variant data with clinical significance, and ClinVar accepts cancer variant data. CONCLUSIONS: We expect the use of the MVLD to streamline clinical interpretation of cancer variants, enhance interoperability among multiple redundant curation efforts, and increase submission of somatic variants to ClinVar, all of which will enhance translation to clinical oncology practice.


Asunto(s)
Curaduría de Datos/normas , Variación Genética , Neoplasias/genética , Algoritmos , Bases de Datos Genéticas , Frecuencia de los Genes , Humanos , Medicina de Precisión
7.
Hum Mutat ; 37(6): 564-9, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-26931183

RESUMEN

The consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome. In particular, DNA diagnostics critically depends on accurate and standardized description and sharing of the variants detected. The sequence variant nomenclature system proposed in 2000 by the Human Genome Variation Society has been widely adopted and has developed into an internationally accepted standard. The recommendations are currently commissioned through a Sequence Variant Description Working Group (SVD-WG) operating under the auspices of three international organizations: the Human Genome Variation Society (HGVS), the Human Variome Project (HVP), and the Human Genome Organization (HUGO). Requests for modifications and extensions go through the SVD-WG following a standard procedure including a community consultation step. Version numbers are assigned to the nomenclature system to allow users to specify the version used in their variant descriptions. Here, we present the current recommendations, HGVS version 15.11, and briefly summarize the changes that were made since the 2000 publication. Most focus has been on removing inconsistencies and tightening definitions allowing automatic data processing. An extensive version of the recommendations is available online, at http://www.HGVS.org/varnomen.


Asunto(s)
Variación Genética , Proyecto Genoma Humano/organización & administración , Terminología como Asunto , Genoma Humano , Guías como Asunto , Humanos , Análisis de Secuencia de ADN
9.
Bioinformatics ; 31(2): 268-70, 2015 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-25273102

RESUMEN

UNLABELLED: Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. AVAILABILITY AND IMPLEMENTATION: The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Variación Genética/genética , Genoma Humano , Programas Informáticos , Terminología como Asunto , Humanos , Anotación de Secuencia Molecular
10.
Pac Symp Biocomput ; : 403-14, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19209718

RESUMEN

This paper describes the design and applications of Unison, a comprehensive and integrated warehouse of protein sequences, diverse precomputed predictions, and other biological data. Unison provides a practical solution to the burden of preparing data for computational discovery projects, enables holistic feature-based mining queries regarding protein composition and functions, and provides a foundation for the development of new tools. Unison is available for immediate use online via direct database connections and a web interface. In addition, the database schema, command line tools, web interface, and non-proprietary precomputed predictions are released under the Academic Free License and available for download at http://unison-db.org/. This project has resulted in a system that significantly reduces several practical impediments to the initiation of computational biology discovery projects.


Asunto(s)
Biometría/métodos , Bases de Datos de Proteínas , Secuencias de Aminoácidos , Bases de Datos Factuales , Internet , Receptores Inmunológicos/química , Receptores Inmunológicos/genética , Programas Informáticos , Interfaz Usuario-Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...