Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
Nucleic Acids Res ; 52(D1): D134-D137, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37889039

RESUMEN

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 25 trillion base pairs from over 3.7 billion nucleotide sequences for 557 000 formally described species. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. Recent updates include policies for including spatio-temporal metadata, clarified documentation for GenBank data processing, enhanced foreign contamination screening tools, new processes in the Submission Portal, migration of Entrez Genome and Assembly displays into NCBI Datasets, and the impending retirement of tbl2asn, replaced by table2asn.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genómica , Secuencia de Bases , Internet , Humanos
2.
Nucleic Acids Res ; 52(D1): D33-D43, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37994677

RESUMEN

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Asunto(s)
Bases de Datos Genéticas , National Library of Medicine (U.S.) , Biotecnología/instrumentación , Bases de Datos de Ácidos Nucleicos , Internet , Estados Unidos
3.
Nucleic Acids Res ; 51(D1): D29-D38, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36370100

RESUMEN

The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. New resources include the Comparative Genome Resource (CGR) and the BLAST ClusteredNR database. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, IgBLAST, GDV, RefSeq, NCBI Virus, GenBank type assemblies, iCn3D, ClinVar, GTR, dbGaP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Asunto(s)
Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Estados Unidos , National Library of Medicine (U.S.) , Alineación de Secuencia , Biotecnología , Internet
4.
Nucleic Acids Res ; 51(D1): D141-D144, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36350640

RESUMEN

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 19.6 trillion base pairs from over 2.9 billion nucleotide sequences for 504 000 formally described species. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. Recent updates include resources for data from the SARS-CoV-2 virus, NCBI Datasets, BLAST ClusteredNR, the Submission Portal, table2asn, a Foreign Contamination Screening tool and BioSample.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Humanos , COVID-19/genética , Genómica , SARS-CoV-2/genética
5.
Nucleic Acids Res ; 50(D1): D161-D164, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850943

RESUMEN

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 15.3 trillion base pairs from over 2.5 billion nucleotide sequences for 504 000 formally described species. Recent updates include resources for data from the SARS-CoV-2 virus, including a SARS-CoV-2 landing page, NCBI Datasets, NCBI Virus and the Submission Portal. We also discuss upcoming changes to GI identifiers, a new data management interface for BioProject, and advice for providing contextual metadata in submissions.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Virus/genética , Genoma Viral , National Library of Medicine (U.S.) , SARS-CoV-2/genética , Estados Unidos , Interfaz Usuario-Computador
6.
Nucleic Acids Res ; 50(D1): D20-D26, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850941

RESUMEN

The National Center for Biotechnology Information (NCBI) produces a variety of online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for the most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, RefSeq, SRA, Virus, dbSNP, dbVar, ClinicalTrials.gov, MMDB, iCn3D and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Asunto(s)
Biotecnología/tendencias , Bases de Datos Genéticas/tendencias , Bases de Datos de Compuestos Químicos , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Humanos , Internet , National Library of Medicine (U.S.) , PubMed , Estados Unidos
7.
Nat Biotechnol ; 39(9): 1141-1150, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34504346

RESUMEN

Clinical applications of precision oncology require accurate tests that can distinguish true cancer-specific mutations from errors introduced at each step of next-generation sequencing (NGS). To date, no bulk sequencing study has addressed the effects of cross-site reproducibility, nor the biological, technical and computational factors that influence variant identification. Here we report a systematic interrogation of somatic mutations in paired tumor-normal cell lines to identify factors affecting detection reproducibility and accuracy at six different centers. Using whole-genome sequencing (WGS) and whole-exome sequencing (WES), we evaluated the reproducibility of different sample types with varying input amount and tumor purity, and multiple library construction protocols, followed by processing with nine bioinformatics pipelines. We found that read coverage and callers affected both WGS and WES reproducibility, but WES performance was influenced by insert fragment size, genomic copy content and the global imbalance score (GIV; G > T/C > A). Finally, taking into account library preparation protocol, tumor content, read coverage and bioinformatics processes concomitantly, we recommend actionable practices to improve the reproducibility and accuracy of NGS experiments for cancer mutation detection.


Asunto(s)
Benchmarking , Secuenciación del Exoma/normas , Neoplasias/genética , Análisis de Secuencia de ADN/normas , Secuenciación Completa del Genoma/normas , Línea Celular , Línea Celular Tumoral , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Mutación , Neoplasias/patología , Reproducibilidad de los Resultados
8.
Nucleic Acids Res ; 49(D1): D92-D96, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33196830

RESUMEN

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 9.9 trillion base pairs from over 2.1 billion nucleotide sequences for 478 000 formally described species. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. Recent updates include new resources for data from the SARS-CoV-2 virus, updates to the NCBI Submission Portal and associated submission wizards for dengue and SARS-CoV-2 viruses, new taxonomy queries for viruses and prokaryotes, and simplified submission processes for EST and GSS sequences.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Bases de Datos de Ácidos Nucleicos , Genómica/métodos , SARS-CoV-2/genética , Análisis de Secuencia de ADN/métodos , Animales , COVID-19/epidemiología , COVID-19/virología , Biología Computacional/métodos , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Anotación de Secuencia Molecular/métodos , Pandemias
9.
Nucleic Acids Res ; 49(D1): D10-D17, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33095870

RESUMEN

The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. The Entrez system provides search and retrieval operations for most of these data from 34 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Custom implementations of the BLAST program provide sequence-based searching of many specialized datasets. New resources released in the past year include a new PubMed interface and NCBI datasets. Additional resources that were updated in the past year include PMC, Bookshelf, Genome Data Viewer, SRA, ClinVar, dbSNP, dbVar, Pathogen Detection, BLAST, Primer-BLAST, IgBLAST, iCn3D and PubChem. All of these resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.


Asunto(s)
Bases de Datos Genéticas , National Library of Medicine (U.S.) , Biología Computacional/métodos , Bases de Datos de Compuestos Químicos , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Genómica/métodos , Humanos , PubMed , Estados Unidos
11.
Nat Biotechnol ; 38(11): 1347-1355, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32541955

RESUMEN

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.


Asunto(s)
Mutación de Línea Germinal/genética , Mutación INDEL/genética , Diploidia , Variación Estructural del Genoma , Humanos , Anotación de Secuencia Molecular , Análisis de Secuencia de ADN
12.
Sci Data ; 6(1): 91, 2019 06 14.
Artículo en Inglés | MEDLINE | ID: mdl-31201313

RESUMEN

Single-molecule long-read sequencing datasets were generated for a son-father-mother trio of Han Chinese descent that is part of the Genome in a Bottle (GIAB) consortium portfolio. The dataset was generated using the Pacific Biosciences Sequel System. The son and each parent were sequenced to an average coverage of 60 and 30, respectively, with N50 subread lengths between 16 and 18 kb. Raw reads and reads aligned to both the GRCh37 and GRCh38 are available at the NCBI GIAB ftp site (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/). The GRCh38 aligned read data are archived in NCBI SRA (SRX4739017, SRX4739121, and SRX4739122). This dataset is available for anyone to develop and evaluate long-read bioinformatics methods.


Asunto(s)
Pueblo Asiatico/genética , Bases de Datos Genéticas , Genoma Humano , Núcleo Familiar , China , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino , Análisis de Secuencia de ADN
13.
Nat Biotechnol ; 37(5): 561-566, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30936564

RESUMEN

Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.


Asunto(s)
Benchmarking , Biología Computacional/tendencias , Genoma Humano/genética , Genómica/tendencias , Variación Genética/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL/genética , Polimorfismo de Nucleótido Simple , Programas Informáticos/tendencias
14.
Nat Biotechnol ; 37(4): 480, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30894680

RESUMEN

In the version of this article initially published, Lena Dolman's second affiliation was given as Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK. The correct second affiliation is Ontario Institute for Cancer Research, Toronto, Ontario, Canada. The error has been corrected in the HTML and PDF versions of the article.

16.
Eur J Hum Genet ; 26(12): 1721-1731, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30069064

RESUMEN

The Global Alliance for Genomics and Health (GA4GH) proposes a data access policy model-"registered access"-to increase and improve access to data requiring an agreement to basic terms and conditions, such as the use of DNA sequence and health data in research. A registered access policy would enable a range of categories of users to gain access, starting with researchers and clinical care professionals. It would also facilitate general use and reuse of data but within the bounds of consent restrictions and other ethical obligations. In piloting registered access with the Scientific Demonstration data sharing projects of GA4GH, we provide additional ethics, policy and technical guidance to facilitate the implementation of this access model in an international setting.


Asunto(s)
Acceso a la Información , Genética Médica/normas , Genómica/normas , Difusión de la Información , Genética Médica/ética , Genética Médica/legislación & jurisprudencia , Genómica/ética , Genómica/legislación & jurisprudencia , Humanos , Concesión de Licencias , Guías de Práctica Clínica como Asunto
17.
PLoS One ; 12(6): e0179106, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28609482

RESUMEN

Genome-wide association studies (GWAS) usually rely on the assumption that different samples are not from closely related individuals. Detection of duplicates and close relatives becomes more difficult both statistically and computationally when one wants to combine datasets that may have been genotyped on different platforms. The dbGaP repository at the National Center of Biotechnology Information (NCBI) contains datasets from hundreds of studies with over one million samples. There are many duplicates and closely related individuals both within and across studies from different submitters. Relationships between studies cannot always be identified by the submitters of individual datasets. To aid in curation of dbGaP, we developed a rapid statistical method called Genetic Relationship and Fingerprinting (GRAF) to detect duplicates and closely related samples, even when the sets of genotyped markers differ and the DNA strand orientations are unknown. GRAF extracts genotypes of 10,000 informative and independent SNPs from genotype datasets obtained using different methods, and implements quick algorithms that enable it to find all of the duplicate pairs from more than 880,000 samples within and across dbGaP studies in less than two hours. In addition, GRAF uses two statistical metrics called All Genotype Mismatch Rate (AGMR) and Homozygous Genotype Mismatch Rate (HGMR) to determine subject relationships directly from the observed genotypes, without estimating probabilities of identity by descent (IBD), or kinship coefficients, and compares the predicted relationships with those reported in the pedigree files. We implemented GRAF in a freely available C++ program of the same name. In this paper, we describe the methods in GRAF and validate the usage of GRAF on samples from the dbGaP repository. Other scientists can use GRAF on their own samples and in combination with samples downloaded from dbGaP.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Minería de Datos/métodos , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Polimorfismo de Nucleótido Simple , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Reproducibilidad de los Resultados
18.
Nucleic Acids Res ; 45(D1): D819-D826, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899644

RESUMEN

The database of Genotypes and Phenotypes (dbGaP) Data Browser (https://www.ncbi.nlm.nih.gov/gap/ddb/) was developed in response to requests from the scientific community for a resource that enable view-only access to summary-level information and individual-level genotype and sequence data associated with phenotypic features maintained in the controlled-access tier of dbGaP. Until now, the dbGaP controlled-access environment required investigators to submit a data access request, wait for Data Access Committee review, download each data set and locally examine them for potentially relevant information. Existing unrestricted-access genomic data browsing resources (e.g. http://evs.gs.washington.edu/EVS/, http://exac.broadinstitute.org/) provide only summary statistics or aggregate allele frequencies. The dbGaP Data Browser serves as a third solution, providing researchers with view-only access to a compilation of individual-level data from general research use (GRU) studies through a simplified controlled-access process. The National Institutes of Health (NIH) will continue to improve the Browser in response to user feedback and believes that this tool may decrease unnecessary download requests, while still facilitating responsible genomic data-sharing.


Asunto(s)
Bases de Datos Genéticas , Genómica/métodos , Genotipo , Fenotipo , Programas Informáticos , Navegador Web , Biología Computacional/métodos , Estudios de Asociación Genética/métodos
19.
Sci Data ; 3: 160025, 2016 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-27271295

RESUMEN

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.


Asunto(s)
Benchmarking , Genoma Humano , Exoma , Genómica , Humanos , Mutación INDEL
20.
PLoS Genet ; 12(1): e1005772, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26796797

RESUMEN

A systematic way of recording data use conditions that are based on consent permissions as found in the datasets of the main public genome archives (NCBI dbGaP and EMBL-EBI/CRG EGA).


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Genoma , Biblioteca Genómica , Investigación sobre Servicios de Salud
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...