Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 47(D1): D464-D474, 2019 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-30357411

RESUMEN

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, rcsb.org), the US data center for the global PDB archive, serves thousands of Data Depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without usage restrictions to more than 1 million rcsb.org Users worldwide and 600 000 pdb101.rcsb.org education-focused Users around the globe. PDB Data Depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy and 3D electron microscopy. PDB Data Consumers include researchers, educators and students studying Fundamental Biology, Biomedicine, Biotechnology and Energy. Recent reorganization of RCSB PDB activities into four integrated, interdependent services is described in detail, together with tools and resources added over the past 2 years to RCSB PDB web portals in support of a 'Structural View of Biology.'


Asunto(s)
Bases de Datos de Proteínas , Conformación Proteica , Investigación Biomédica/educación , Biotecnología/educación , Curaduría de Datos , Programas Informáticos
2.
PLoS Comput Biol ; 15(4): e1006842, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-31009453

RESUMEN

Many proteins fold into highly regular and repetitive three dimensional structures. The analysis of structural patterns and repeated elements is fundamental to understand protein function and evolution. We present recent improvements to the CE-Symm tool for systematically detecting and analyzing the internal symmetry and structural repeats in proteins. In addition to the accurate detection of internal symmetry, the tool is now capable of i) reporting the type of symmetry, ii) identifying the smallest repeating unit, iii) describing the arrangement of repeats with transformation operations and symmetry axes, and iv) comparing the similarity of all the internal repeats at the residue level. CE-Symm 2.0 helps the user investigate proteins with a robust and intuitive sequence-to-structure analysis, with many applications in protein classification, functional annotation and evolutionary studies. We describe the algorithmic extensions of the method and demonstrate its applications to the study of interesting cases of protein evolution.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Proteínas/química , Programas Informáticos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Modelos Moleculares , Análisis de Secuencia de Proteína
3.
PLoS Comput Biol ; 15(2): e1006791, 2019 02.
Artículo en Inglés | MEDLINE | ID: mdl-30735498

RESUMEN

BioJava is an open-source project that provides a Java library for processing biological data. The project aims to simplify bioinformatic analyses by implementing parsers, data structures, and algorithms for common tasks in genomics, structural biology, ontologies, phylogenetics, and more. Since 2012, we have released two major versions of the library (4 and 5) that include many new features to tackle challenges with increasingly complex macromolecular structure data. BioJava requires Java 8 or higher and is freely available under the LGPL 2.1 license. The project is hosted on GitHub at https://github.com/biojava/biojava. More information and documentation can be found online on the BioJava website (http://www.biojava.org) and tutorial (https://github.com/biojava/biojava-tutorial). All inquiries should be directed to the GitHub page or the BioJava mailing list (http://lists.open-bio.org/mailman/listinfo/biojava-l).


Asunto(s)
Biología Computacional/métodos , Acceso a la Información , Algoritmos , Biblioteca de Genes , Genoma/genética , Genómica , Almacenamiento y Recuperación de la Información , Internet , Programas Informáticos
4.
Bioinformatics ; 34(21): 3755-3758, 2018 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-29850778

RESUMEN

Motivation: The interactive visualization of very large macromolecular complexes on the web is becoming a challenging problem as experimental techniques advance at an unprecedented rate and deliver structures of increasing size. Results: We have tackled this problem by developing highly memory-efficient and scalable extensions for the NGL WebGL-based molecular viewer and by using Macromolecular Transmission Format (MMTF), a binary and compressed MMTF. These enable NGL to download and render molecular complexes with millions of atoms interactively on desktop computers and smartphones alike, making it a tool of choice for web-based molecular visualization in research and education. Availability and implementation: The source code is freely available under the MIT license at github.com/arose/ngl and distributed on NPM (npmjs.com/package/ngl). MMTF-JavaScript encoders and decoders are available at github.com/rcsb/mmtf-javascript.


Asunto(s)
Gráficos por Computador , Internet , Sustancias Macromoleculares , Programas Informáticos
5.
Nucleic Acids Res ; 45(D1): D271-D281, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27794042

RESUMEN

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, http://rcsb.org), the US data center for the global PDB archive, makes PDB data freely available to all users, from structural biologists to computational biologists and beyond. New tools and resources have been added to the RCSB PDB web portal in support of a 'Structural View of Biology.' Recent developments have improved the User experience, including the high-speed NGL Viewer that provides 3D molecular visualization in any web browser, improved support for data file download and enhanced organization of website pages for query, reporting and individual structure exploration. Structure validation information is now visible for all archival entries. PDB data have been integrated with external biological resources, including chromosomal position within the human genome; protein modifications; and metabolic pathways. PDB-101 educational materials have been reorganized into a searchable website and expanded to include new features such as the Geis Digital Archive.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Proteínas/química , Proteínas/genética , Conjuntos de Datos como Asunto , Redes y Vías Metabólicas , Modelos Moleculares , Conformación Proteica , Proteínas/metabolismo , Programas Informáticos , Relación Estructura-Actividad , Interfaz Usuario-Computador , Navegador Web
6.
Hum Mutat ; 39(12): 1803-1813, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30129167

RESUMEN

The Human Genome Variation Society (HGVS) nomenclature guidelines encourage the accurate and standard description of DNA, RNA, and protein sequence variants in public variant databases and the scientific literature. Inconsistent application of the HGVS guidelines can lead to misinterpretation of variants in clinical settings. Reliable software tools are essential to ensure consistent application of the HGVS guidelines when reporting and interpreting variants. We present the hgvs Python package, a comprehensive tool for manipulating sequence variants according to the HGVS nomenclature guidelines. Distinguishing features of the hgvs package include: (1) parsing, formatting, validating, and normalizing variants on genome, transcript, and protein sequences; (2) projecting variants between aligned sequences, including those with gapped alignments; (3) flexible installation using remote or local data (fully local installations eliminate network dependencies); (4) extensive automated tests; and (5) open source development by a community from eight organizations worldwide. This report summarizes recent and significant updates to the hgvs package since its original release in 2014, and presents results of extensive validation using clinical relevant variants from ClinVar and HGMD.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Variación Genética , Genoma Humano , Guías como Asunto , Humanos , Sociedades Médicas , Programas Informáticos
7.
Bioinformatics ; 33(13): 2047-2049, 2017 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-28334105

RESUMEN

SUMMARY: We developed a new software tool, BioJava-ModFinder, for identifying protein modifications observed in 3D structures archived in the Protein Data Bank (PDB). Information on more than 400 types of protein modifications were collected and curated from annotations in PDB, RESID, and PSI-MOD. We divided these modifications into three categories: modified residues, attachment modifications, and cross-links. We have developed a systematic method to identify these modifications in 3D protein structures. We have integrated this package with the RCSB PDB web application and added protein modification annotations to the sequence diagram and structure display. By scanning all 3D structures in the PDB using BioJava-ModFinder, we identified more than 30 000 structures with protein modifications, which can be searched, browsed, and visualized on the RCSB PDB website. AVAILABILITY AND IMPLEMENTATION: BioJava-ModFinder is available as open source (LGPL license) at ( https://github.com/biojava/biojava/tree/master/biojava-modfinder ). The RCSB PDB can be accessed at http://www.rcsb.org . CONTACT: pwrose@ucsd.edu.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Conformación Proteica , Programas Informáticos , Internet
8.
PLoS Comput Biol ; 13(6): e1005575, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28574982

RESUMEN

Recent advances in experimental techniques have led to a rapid growth in complexity, size, and number of macromolecular structures that are made available through the Protein Data Bank. This creates a challenge for macromolecular visualization and analysis. Macromolecular structure files, such as PDB or PDBx/mmCIF files can be slow to transfer, parse, and hard to incorporate into third-party software tools. Here, we present a new binary and compressed data representation, the MacroMolecular Transmission Format, MMTF, as well as software implementations in several languages that have been developed around it, which address these issues. We describe the new format and its APIs and demonstrate that it is several times faster to parse, and about a quarter of the file size of the current standard format, PDBx/mmCIF. As a consequence of the new data representation, it is now possible to visualize structures with millions of atoms in a web browser, keep the whole PDB archive in memory or parse it within few minutes on average computers, which opens up a new way of thinking how to design and implement efficient algorithms in structural bioinformatics. The PDB archive is available in MMTF file format through web services and data that are updated on a weekly basis.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Compuestos Químicos , Sustancias Macromoleculares , Programas Informáticos , Internet , Sustancias Macromoleculares/análisis , Sustancias Macromoleculares/química , Sustancias Macromoleculares/clasificación , Estructura Molecular
9.
Bioinformatics ; 32(24): 3833-3835, 2016 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-27551105

RESUMEN

The Protein Data Bank (PDB) now contains more than 120,000 three-dimensional (3D) structures of biological macromolecules. To allow an interpretation of how PDB data relates to other publicly available annotations, we developed a novel data integration platform that maps 3D structural information across various datasets. This integration bridges from the human genome across protein sequence to 3D structure space. We developed novel software solutions for data management and visualization, while incorporating new libraries for web-based visualization using SVG graphics. AVAILABILITY AND IMPLEMENTATION: The new views are available from http://www.rcsb.org and software is available from https://github.com/rcsb/. CONTACT: andreas.prlic@rcsb.orgSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Proteínas , Conformación Proteica , Programas Informáticos , Secuencia de Aminoácidos , Gráficos por Computador , Genómica , Humanos , Interfaz Usuario-Computador
10.
Nucleic Acids Res ; 43(Database issue): D345-56, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25428375

RESUMEN

The RCSB Protein Data Bank (RCSB PDB, http://www.rcsb.org) provides access to 3D structures of biological macromolecules and is one of the leading resources in biology and biomedicine worldwide. Our efforts over the past 2 years focused on enabling a deeper understanding of structural biology and providing new structural views of biology that support both basic and applied research and education. Herein, we describe recently introduced data annotations including integration with external biological resources, such as gene and drug databases, new visualization tools and improved support for the mobile web. We also describe access to data files, web services and open access software components to enable software developers to more effectively mine the PDB archive and related annotations. Our efforts are aimed at expanding the role of 3D structure in understanding biology and medicine.


Asunto(s)
Bases de Datos de Proteínas , Conformación Proteica , Sitios de Unión , Internet , Proteínas de la Membrana/química , Biología Molecular/educación , Anotación de Secuencia Molecular , Complejos Multiproteicos/química , Péptidos/química , Preparaciones Farmacéuticas/química , Investigación , Programas Informáticos
11.
Bioinformatics ; 31(8): 1316-8, 2015 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-25505094

RESUMEN

MOTIVATION: Circular permutation is an important type of protein rearrangement. Natural circular permutations have implications for protein function, stability and evolution. Artificial circular permutations have also been used for protein studies. However, such relationships are difficult to detect for many sequence and structure comparison algorithms and require special consideration. RESULTS: We developed a new algorithm, called Combinatorial Extension for Circular Permutations (CE-CP), which allows the structural comparison of circularly permuted proteins. CE-CP was designed to be user friendly and is integrated into the RCSB Protein Data Bank. It was tested on two collections of circularly permuted proteins. Pairwise alignments can be visualized both in a desktop application or on the web using Jmol and exported to other programs in a variety of formats. AVAILABILITY AND IMPLEMENTATION: The CE-CP algorithm can be accessed through the RCSB website at http://www.rcsb.org/pdb/workbench/workbench.do. Source code is available under the LGPL 2.1 as part of BioJava 3 (http://biojava.org; http://github.com/biojava/biojava). CONTACT: sbliven@ucsd.edu or info@rcsb.org.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Bases de Datos de Proteínas , Dinaminas/química , Homología Estructural de Proteína , Humanos , Lenguajes de Programación , Estructura Terciaria de Proteína , Análisis de Secuencia de Proteína/métodos
12.
Bioinformatics ; 31(1): 126-7, 2015 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-25183487

RESUMEN

SUMMARY: The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) resource provides tools for query, analysis and visualization of the 3D structures in the PDB archive. As the mobile Web is starting to surpass desktop and laptop usage, scientists and educators are beginning to integrate mobile devices into their research and teaching. In response, we have developed the RCSB PDB Mobile app for the iOS and Android mobile platforms to enable fast and convenient access to RCSB PDB data and services. Using the app, users from the general public to expert researchers can quickly search and visualize biomolecules, and add personal annotations via the RCSB PDB's integrated MyPDB service. AVAILABILITY AND IMPLEMENTATION: RCSB PDB Mobile is freely available from the Apple App Store and Google Play (http://www.rcsb.org).


Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Bases de Datos de Proteínas , Aplicaciones Móviles , Programas Informáticos , Investigación Biomédica , Humanos , Interfaz Usuario-Computador , Flujo de Trabajo
13.
Nucleic Acids Res ; 41(Database issue): D475-82, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23193259

RESUMEN

The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) develops tools and resources that provide a structural view of biology for research and education. The RCSB PDB web site (http://www.rcsb.org) uses the curated 3D macromolecular data contained in the PDB archive to offer unique methods to access, report and visualize data. Recent activities have focused on improving methods for simple and complex searches of PDB data, creating specialized access to chemical component data and providing domain-based structural alignments. New educational resources are offered at the PDB-101 educational view of the main web site such as Author Profiles that display a researcher's PDB entries in a timeline. To promote different kinds of access to the RCSB PDB, Web Services have been expanded, and an RCSB PDB Mobile application for the iPhone/iPad has been released. These improvements enable new opportunities for analyzing and understanding structure data.


Asunto(s)
Bases de Datos de Proteínas , Conformación Proteica , Bioquímica/educación , Gráficos por Computador , Internet , Ligandos , Estructura Terciaria de Proteína , Investigación , Homología Estructural de Proteína
14.
Bioinformatics ; 28(20): 2693-5, 2012 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-22877863

RESUMEN

UNLABELLED: BioJava is an open-source project for processing of biological data in the Java programming language. We have recently released a new version (3.0.5), which is a major update to the code base that greatly extends its functionality. RESULTS: BioJava now consists of several independent modules that provide state-of-the-art tools for protein structure comparison, pairwise and multiple sequence alignments, working with DNA and protein sequences, analysis of amino acid properties, detection of protein modifications and prediction of disordered regions in proteins as well as parsers for common file formats using a biologically meaningful data model. AVAILABILITY: BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org). BioJava requires Java 1.6 or higher. All inquiries should be directed to the BioJava mailing lists. Details are available at http://biojava.org/wiki/BioJava:MailingLists.


Asunto(s)
Proteínas/química , Análisis de Secuencia , Programas Informáticos , Aminoácidos/química , Biología Computacional , Genómica , Conformación Proteica , Procesamiento Proteico-Postraduccional , Alineación de Secuencia , Análisis de Secuencia de ADN , Análisis de Secuencia de Proteína
15.
Nucleic Acids Res ; 39(Database issue): D392-401, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21036868

RESUMEN

The RCSB Protein Data Bank (RCSB PDB) web site (http://www.pdb.org) has been redesigned to increase usability and to cater to a larger and more diverse user base. This article describes key enhancements and new features that fall into the following categories: (i) query and analysis tools for chemical structure searching, query refinement, tabulation and export of query results; (ii) web site customization and new structure alerts; (iii) pair-wise and representative protein structure alignments; (iv) visualization of large assemblies; (v) integration of structural data with the open access literature and binding affinity data; and (vi) web services and web widgets to facilitate integration of PDB data and tools with other resources. These improvements enable a range of new possibilities to analyze and understand structure data. The next generation of the RCSB PDB web site, as described here, provides a rich resource for research and education.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Animales , Gráficos por Computador , Humanos , Internet , Ligandos , Ratones , Conformación Proteica , Integración de Sistemas , Interfaz Usuario-Computador
16.
Pac Symp Biocomput ; 28: 383-394, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36540993

RESUMEN

As the diversity of genomic variation data increases with our growing understanding of the role of variation in health and disease, it is critical to develop standards for precise inter-system exchange of these data for research and clinical applications. The Global Alliance for Genomics and Health (GA4GH) Variation Representation Specification (VRS) meets this need through a technical terminology and information model for disambiguating and concisely representing variation concepts. Here we discuss the recent Genotype model in VRS, which may be used to represent the allelic composition of a genetic locus. We demonstrate the use of the Genotype model and the constituent Haplotype model for the precise and interoperable representation of pharmacogenomic diplotypes, HGVS variants, and VCF records using VRS and discuss how this can be leveraged to enable interoperable exchange and search operations between assayed variation and genomic knowledgebases.


Asunto(s)
Biología Computacional , Variación Genética , Humanos , Bases de Datos Genéticas , Genómica , Genotipo
17.
Bioinformatics ; 26(23): 2983-5, 2010 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-20937596

RESUMEN

SUMMARY: With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user. AVAILABILITY AND IMPLEMENTATION: A web user interface is available at http://www.rcsb.org/pdb/workbench/workbench.do. The source code is available under the LGPL license from http://www.biojava.org. A source bundle, prepared for local execution, is available from http://source.rcsb.org CONTACT: andreas@sdsc.edu; pbourne@ucsd.edu.


Asunto(s)
Bases de Datos de Proteínas , Programas Informáticos , Homología Estructural de Proteína , Algoritmos , Secuencia de Aminoácidos , Internet , Proteínas/química , Interfaz Usuario-Computador
18.
BMC Bioinformatics ; 11: 220, 2010 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-20429930

RESUMEN

BACKGROUND: Biological data have traditionally been stored and made publicly available through a variety of on-line databases, whereas biological knowledge has traditionally been found in the printed literature. With journals now on-line and providing an increasing amount of open access content, often free of copyright restriction, this distinction between database and literature is blurring. To exploit this opportunity we present the integration of open access literature with the RCSB Protein Data Bank (PDB). RESULTS: BioLit provides an enhanced view of articles with markup of semantic data and links to biological databases, based on the content of the article. For example, words matching to existing biological ontologies are highlighted and database identifiers are linked to their database of origin. Among other functions, it identifies PDB IDs that are mentioned in the open access literature, by parsing the full text for all research articles in PubMed Central (PMC) and exposing the results as simple XML Web Services. Here, we integrate BioLit results with the RCSB PDB website by using these services to find PDB IDs that are mentioned in research articles and subsequently retrieving abstract, figures, and text excerpts for those articles. A new RCSB PDB literature view permits browsing through the figures and abstracts of the articles that mention a given structure. The BioLit Web Services that are providing the underlying data are publicly accessible. A client library is provided that supports querying these services (Java). CONCLUSIONS: The integration between literature and websites, as demonstrated here with the RCSB PDB, provides a broader view for how a given structure has been analyzed and used. This approach detects the mention of a PDB structure even if it is not formally cited in the paper. Other structures related through the same literature references can also be identified, possibly providing new scientific insight. To our knowledge this is the first time that database and literature have been integrated in this way and it speaks to the opportunities afforded by open and free access to both database and literature content.


Asunto(s)
Bases de Datos de Proteínas , Proteínas/química , Programas Informáticos , PubMed , Publicaciones , Integración de Sistemas , Interfaz Usuario-Computador
19.
Bioinformatics ; 25(10): 1321-8, 2009 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-19420069

RESUMEN

MOTIVATION: Ever increasing amounts of biological interaction data are being accumulated worldwide, but they are currently not readily accessible to the biologist at a single site. New techniques are required for retrieving, sharing and presenting data spread over the Internet. RESULTS: We introduce the DASMI system for the dynamic exchange, annotation and assessment of molecular interaction data. DASMI is based on the widely used Distributed Annotation System (DAS) and consists of a data exchange specification, web servers for providing the interaction data and clients for data integration and visualization. The decentralized architecture of DASMI affords the online retrieval of the most recent data from distributed sources and databases. DASMI can also be extended easily by adding new data sources and clients. We describe all DASMI components and demonstrate their use for protein and domain interactions. AVAILABILITY: The DASMI tools are available at http://www.dasmi.de/ and http://ipfam.sanger.ac.uk/graph. The DAS registry and the DAS 1.53E specification is found at http://www.dasregistry.org/.


Asunto(s)
Biología Computacional/métodos , Mapeo de Interacción de Proteínas , Programas Informáticos , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Internet , Proteínas/química , Interfaz Usuario-Computador
20.
PLoS One ; 15(12): e0239883, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33270643

RESUMEN

MOTIVATION: Access to biological sequence data, such as genome, transcript, or protein sequence, is at the core of many bioinformatics analysis workflows. The National Center for Biotechnology Information (NCBI), Ensembl, and other sequence database maintainers provide methods to access sequences through network connections. For many users, the convenience and currency of remotely managed data are compelling, and the network latency is non-consequential. However, for high-throughput and clinical applications, local sequence collections are essential for performance, stability, privacy, and reproducibility. RESULTS: Here we describe SeqRepo, a novel system for building a local, high-performance, non-redundant collection of biological sequences. SeqRepo enables clients to use primary database identifiers and several digests to identify sequences and sequence alises. SeqRepo provides a native Python interface and a REST interface, which can run locally and enables access from other programming languages. SeqRepo also provides an alternative REST interface based on the GA4GH refget protocol. SeqRepo provides fast random access to sequence slices. We provide results that demonstrate that a local SeqRepo sequence collection yields significant performance benefits of up to 1300-fold over remote sequence collections. In our use case for a variant validation and normalization pipeline, SeqRepo improved throughput 50-fold relative to use with remote sequences. SeqRepo may be used with any species or sequence type. Regular snapshots of Human sequence collections are available. It is often convenient or necessary to use a computed digest as a sequence identifier. For example, a digest-based identifier may be used to refer to proprietary reference genomes or segments of a graph genome, for which conventional identifiers will not be available. Here we also introduce a convention for the application of the SHA-512 hashing algorithm with Base64 encoding to generate URL-safe identifiers. This convention, sha512t24u, combines a fast digest mechanism with a space-efficient representation that can be used for any object. Our report includes an analysis of timing and collision probabilities for sha512t24u. SeqRepo enables clients to use sha512t24u as identifiers, thereby seamlessly integrating public and private sequence sets. AVAILABILITY: SeqRepo is released under the Apache License 2.0 and is available on github and PyPi. Docker images and database snapshots are also available. See https://github.com/biocommons/biocommons.seqrepo.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Programas Informáticos , Algoritmos , Genoma Humano/genética , Genómica/métodos , Humanos , Lenguajes de Programación , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA