Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Nat Methods ; 21(6): 994-1002, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38755321

RESUMEN

Searching vast and rapidly growing nucleotide content in resources, such as runs in the Sequence Read Archive and assemblies for whole-genome shotgun sequencing projects in GenBank, is currently impractical for most researchers. Here we present Pebblescout, a tool that navigates such content by providing indexing and search capabilities. Indexing uses dense sampling of the sequences in the resource. Search finds subjects (runs or assemblies) that have short sequence matches to a user query, with well-defined guarantees and ranks them using informativeness of the matches. We illustrate the functionality of Pebblescout by creating eight databases that index over 3.7 petabases. The web service of Pebblescout can be reached at https://pebblescout.ncbi.nlm.nih.gov . We show that for a wide range of query lengths, Pebblescout provides a data-driven way for finding relevant subsets of large nucleotide resources, reducing the effort for downstream analysis substantially. We also show that Pebblescout results compare favorably to MetaGraph and Sourmash.


Asunto(s)
Programas Informáticos , Nucleótidos/genética , Humanos , Bases de Datos Genéticas , Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Algoritmos
2.
PLoS One ; 19(1): e0291406, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38241320

RESUMEN

Candida auris is a newly emerged multidrug-resistant fungus capable of causing invasive infections with high mortality. Despite intense efforts to understand how this pathogen rapidly emerged and spread worldwide, its environmental reservoirs are poorly understood. Here, we present a collaborative effort between the U.S. Centers for Disease Control and Prevention, the National Center for Biotechnology Information, and GridRepublic (a volunteer computing platform) to identify C. auris sequences in publicly available metagenomic datasets. We developed the MetaNISH pipeline that uses SRPRISM to align sequences to a set of reference genomes and computes a score for each reference genome. We used MetaNISH to scan ~300,000 SRA metagenomic runs from 2010 onwards and identified five datasets containing C. auris reads. Finally, GridRepublic has implemented a prospective C. auris molecular monitoring system using MetaNISH and volunteer computing.


Asunto(s)
Candida , Candidiasis , Humanos , Candida/genética , Candidiasis/microbiología , Candida auris , Estudios Prospectivos , Metagenómica , Antifúngicos/uso terapéutico
3.
BMC Genomics ; 17: 37, 2016 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-26742787

RESUMEN

BACKGROUND: Xiphophorus fishes are represented by 26 live-bearing species of tropical fish that express many attributes (e.g., viviparity, genetic and phenotypic variation, ecological adaptation, varied sexual developmental mechanisms, ability to produce fertile interspecies hybrids) that have made attractive research models for over 85 years. Use of various interspecies hybrids to investigate the genetics underlying spontaneous and induced tumorigenesis has resulted in the development and maintenance of pedigreed Xiphophorus lines specifically bred for research. The recent availability of the X. maculatus reference genome assembly now provides unprecedented opportunities for novel and exciting comparative research studies among Xiphophorus species. RESULTS: We present sequencing, assembly and annotation of two new genomes representing Xiphophorus couchianus and Xiphophorus hellerii. The final X. couchianus and X. hellerii assemblies have total sizes of 708 Mb and 734 Mb and correspond to 98 % and 102 % of the X. maculatus Jp 163 A genome size, respectively. The rates of single nucleotide change range from 1 per 52 bp to 1 per 69 bp among the three genomes and the impact of putatively damaging variants are presented. In addition, a survey of transposable elements allowed us to deduce an ancestral TE landscape, uncovered potential active TEs and document a recent burst of TEs during evolution of this genus. CONCLUSIONS: Two new Xiphophorus genomes and their corresponding transcriptomes were efficiently assembled, the former using a novel guided assembly approach. Three assembled genome sequences within this single vertebrate order of new world live-bearing fishes will accelerate our understanding of relationship between environmental adaptation and genome evolution. In addition, these genome resources provide capability to determine allele specific gene regulation among interspecies hybrids produced by crossing any of the three species that are known to produce progeny predisposed to tumor development.


Asunto(s)
Ciprinodontiformes/genética , Variación Genética , Genoma , Transcriptoma/genética , Animales , Regulación de la Expresión Génica , Genómica , Especificidad de la Especie
4.
Genome Res ; 24(12): 2066-76, 2014 12.
Artículo en Inglés | MEDLINE | ID: mdl-25373144

RESUMEN

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.


Asunto(s)
Genoma Humano , Haplotipos , Mola Hidatiforme/genética , Alelos , Mapeo Cromosómico , Cromosomas Artificiales Bacterianos , Biología Computacional/métodos , Femenino , Genómica/métodos , Heterocigoto , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple , Embarazo , Secuencias Repetitivas de Ácidos Nucleicos , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADN
5.
Bioinformatics ; 23(21): 2949-51, 2007 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-17921491

RESUMEN

MOTIVATION: The blastp and tblastn modules of BLAST are widely used methods for searching protein queries against protein and nucleotide databases, respectively. One heuristic used in BLAST is to consider only database sequences that contain a high-scoring match of length at most 5 to the query. We implemented the capability to use words of length 6 or 7. We demonstrate an improved trade-off between running time and retrieval accuracy, controlled by the score threshold used for short word matches. For example, the running time can be reduced by 20-30% while achieving ROC (receiver operator characteristic) scores similar to those obtained with current default parameters. AVAILABILITY: The option to use long words is in the NCBI C and C++ toolkit code for BLAST, starting with version 2.2.16 of blastall. A Linux executable used to produce the results herein is available at: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/protein_longwords


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información/métodos , Proteínas/química , Proteínas/genética , Alineación de Secuencia/métodos , Interfaz Usuario-Computador , Algoritmos , Gráficos por Computador
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA