RESUMEN
The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, the NIH Comparative Genomics Resource (CGR), NCBI Virus, SRA, RefSeq, foreign contamination screening tools, Taxonomy, iCn3D, ClinVar, GTR, MedGen, dbSNP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Asunto(s)
Bases de Datos Genéticas , National Library of Medicine (U.S.) , Biotecnología/instrumentación , Bases de Datos de Ácidos Nucleicos , Internet , Estados UnidosRESUMEN
Antimicrobial resistance (AMR) is a major public health problem that requires publicly available tools for rapid analysis. To identify AMR genes in whole-genome sequences, the National Center for Biotechnology Information (NCBI) has produced AMRFinder, a tool that identifies AMR genes using a high-quality curated AMR gene reference database. The Bacterial Antimicrobial Resistance Reference Gene Database consists of up-to-date gene nomenclature, a set of hidden Markov models (HMMs), and a curated protein family hierarchy. Currently, it contains 4,579 antimicrobial resistance proteins and more than 560 HMMs. Here, we describe AMRFinder and its associated database. To assess the predictive ability of AMRFinder, we measured the consistency between predicted AMR genotypes from AMRFinder and resistance phenotypes of 6,242 isolates from the National Antimicrobial Resistance Monitoring System (NARMS). This included 5,425 Salmonella enterica, 770 Campylobacter spp., and 47 Escherichia coli isolates phenotypically tested against various antimicrobial agents. Of 87,679 susceptibility tests performed, 98.4% were consistent with predictions. To assess the accuracy of AMRFinder, we compared its gene symbol output with that of a 2017 version of ResFinder, another publicly available resistance gene detection system. Most gene calls were identical, but there were 1,229 gene symbol differences (8.8%) between them, with differences due to both algorithmic differences and database composition. AMRFinder missed 16 loci that ResFinder found, while ResFinder missed 216 loci that AMRFinder identified. Based on these results, AMRFinder appears to be a highly accurate AMR gene detection system.
RESUMEN
BACKGROUND: The transcription factor SOX10 is essential for all stages of Schwann cell development including myelination. SOX10 cooperates with other transcription factors to activate the expression of key myelin genes in Schwann cells and is therefore a context-dependent, pro-myelination transcription factor. As such, the identification of genes regulated by SOX10 will provide insight into Schwann cell biology and related diseases. While genome-wide studies have successfully revealed SOX10 target genes, these efforts mainly focused on myelinating stages of Schwann cell development. We propose that less-biased approaches will reveal novel functions of SOX10 outside of myelination. RESULTS: We developed a stringent, computational-based screen for genome-wide identification of SOX10 response elements. Experimental validation of a pilot set of predicted binding sites in multiple systems revealed that SOX10 directly regulates a previously unreported alternative promoter at SOX6, which encodes a transcription factor that inhibits glial cell differentiation. We further explored the utility of our computational approach by combining it with DNase-seq analysis in cultured Schwann cells and previously published SOX10 ChIP-seq data from rat sciatic nerve. Remarkably, this analysis enriched for genomic segments that map to loci involved in the negative regulation of gliogenesis including SOX5, SOX6, NOTCH1, HMGA2, HES1, MYCN, ID4, and ID2. Functional studies in Schwann cells revealed that: (1) all eight loci are expressed prior to myelination and down-regulated subsequent to myelination; (2) seven of the eight loci harbor validated SOX10 binding sites; and (3) seven of the eight loci are down-regulated upon repressing SOX10 function. CONCLUSIONS: Our computational strategy revealed a putative novel function for SOX10 in Schwann cells, which suggests a model where SOX10 activates the expression of genes that inhibit myelination during non-myelinating stages of Schwann cell development. Importantly, the computational and functional datasets we present here will be valuable for the study of transcriptional regulation, SOX protein function, and glial cell biology.
Asunto(s)
Diferenciación Celular , Neuroglía/citología , Neuroglía/metabolismo , Factores de Transcripción SOXE/metabolismo , Secuencia de Bases , Diferenciación Celular/genética , Secuencia de Consenso , Secuencia Conservada , Exones , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Regiones Promotoras Genéticas , Elementos Reguladores de la Transcripción , Elementos de Respuesta , Factores de Transcripción SOXE/química , Factores de Transcripción SOXE/genética , Células de Schwann/metabolismoRESUMEN
Analyses of DNA sequence datasets have repeatedly revealed inconsistencies in phylogenetic trees derived with different data. This is termed phylogenetic incongruence, and may arise from a methodological failure of the inference process or from biological processes, such as horizontal gene transfer, incomplete lineage sorting, and introgression. To better understand patterns of incongruence, we developed a method (PartFinder) that uses likelihood ratios applied to sliding windows for visualizing tree-support changes across genome-sequence alignments, allowing the comparative examination of complex phylogenetic scenarios among many species. As a pilot, we used PartFinder to investigate incongruence in the Homo-Pan-Gorilla group as well as Platyrrhini using high-quality bacterial artificial chromosome (BAC)-derived sequences as well as assembled whole-genome shotgun sequences. Our simulations verified the sensitivity of PartFinder, and our results were comparable to other studies of the Homo-Pan-Gorilla group. Analyses of the whole-genome alignments reveal significant associations between support for the accepted species relationship and specific characteristics of the genomic regions, such as GC-content, alignment score, exon content, and conservation. Finally, we analyzed sequence data generated for five platyrrhine species, and found incongruence that suggests a polytomy within Cebidae, in particular. Together, these studies demonstrate the utility of PartFinder for investigating the patterns of phylogenetic incongruence.
Asunto(s)
Clasificación/métodos , Genoma/genética , Filogenia , Proyectos de Investigación , Programas Informáticos , Animales , Secuencia de Bases , Cromosomas Artificiales Bacterianos/genética , Simulación por Computador , Interpretación Estadística de Datos , Humanos , Funciones de Verosimilitud , Datos de Secuencia Molecular , Primates/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ADNRESUMEN
Antimicrobial resistance (AMR) is a significant public health threat. Low-cost whole-genome sequencing, which is often used in surveillance programmes, provides an opportunity to assess AMR gene content in these genomes using in silico approaches. A variety of bioinformatic tools have been developed to identify these genomic elements. Most of those tools rely on reference databases of nucleotide or protein sequences and collections of models and rules for analysis. While the tools are critical for the identification of AMR genes, the databases themselves also provide significant utility for researchers, for applications ranging from sequence analysis to information about AMR phenotypes. Additionally, these databases can be evaluated by domain experts and others to ensure their accuracy. Here we describe how we curate the genes, point mutations and blast rules, and hidden Markov models used in NCBI's AMRFinderPlus, along with the quality-control steps we take to ensure database quality. We also describe the web interfaces that display the full structure of the database and their newly developed cross-browser relationships. Then, using the Reference Gene Catalog as an example, we detail how the databases, rules and models are made publicly available, as well as how to access the software. In addition, as part of the Pathogen Detection system, we have analysed over 1 million publicly available genomes using AMRFinderPlus and its databases. We discuss how the computed analyses generated by those tools can be accessed through a web interface. Finally, we conclude with NCBI's plans to make these databases accessible over the long-term.
Asunto(s)
Biología Computacional , Programas Informáticos , Secuencia de Aminoácidos , Secuenciación Completa del GenomaRESUMEN
Antimicrobial resistance (AMR) is a significant public health threat. With the rise of affordable whole genome sequencing, in silico approaches to assessing AMR gene content can be used to detect known resistance mechanisms and potentially identify novel mechanisms. To enable accurate assessment of AMR gene content, as part of a multi-agency collaboration, NCBI developed a comprehensive AMR gene database, the Bacterial Antimicrobial Resistance Reference Gene Database and the AMR gene detection tool AMRFinder. Here, we describe the expansion of the Reference Gene Database, now called the Reference Gene Catalog, to include putative acid, biocide, metal, stress resistance genes, in addition to virulence genes and species-specific point mutations. Genes and point mutations are classified by broad functions, as well as more detailed functions. As we have expanded both the functional repertoire of identified genes and functionality, NCBI released a new version of AMRFinder, known as AMRFinderPlus. This new tool allows users the option to utilize only the core set of AMR elements, or include stress response and virulence genes, too. AMRFinderPlus can detect acquired genes and point mutations in both protein and nucleotide sequence. In addition, the evidence used to identify the gene has been expanded to include whether nucleotide or protein sequence was used, its location in the contig, and presence of an internal stop codon. These database improvements and functional expansions will enable increased precision in identifying AMR genes, linking AMR genotypes and phenotypes, and determining possible relationships between AMR, virulence, and stress response.
Asunto(s)
Antibacterianos/farmacología , Bacterias/efectos de los fármacos , Bases de Datos Genéticas , Farmacorresistencia Bacteriana/genética , Genes Bacterianos , Bacterias/genética , Bacterias/patogenicidad , Farmacorresistencia Bacteriana Múltiple/genética , Genoma Bacteriano , Mercurio/farmacología , Plásmidos , Salmonella/efectos de los fármacos , Salmonella/genética , Virulencia/genéticaRESUMEN
The ongoing generation of prodigious amounts of genomic sequence data from myriad vertebrates is providing unparalleled opportunities for establishing definitive phylogenetic relationships among species. The size and complexities of such comparative sequence data sets not only allow smaller and more difficult branches to be resolved but also present unique challenges, including large computational requirements and the negative consequences of systematic biases. To explore these issues and to clarify the phylogenetic relationships among mammals, we have analyzed a large data set of over 60 megabase pairs (Mb) of high-quality genomic sequence, which we generated from 41 mammals and 3 other vertebrates. All sequences are orthologous to a 1.9-Mb region of the human genome that encompasses the cystic fibrosis transmembrane conductance regulator gene (CFTR). To understand the characteristics and challenges associated with phylogenetic analyses of such a large data set, we partitioned the sequence data in several ways and utilized maximum likelihood, maximum parsimony, and Neighbor-Joining algorithms, implemented in parallel on Linux clusters. These studies yielded well-supported phylogenetic trees, largely confirming other recent molecular phylogenetic analyses. Our results provide support for rooting the placental mammal tree between Atlantogenata (Xenarthra and Afrotheria) and Boreoeutheria (Euarchontoglires and Laurasiatheria), illustrate the difficulty in resolving some branches even with large amounts of data (e.g., in the case of Laurasiatheria), and demonstrate the valuable role that very large comparative sequence data sets can play in refining our understanding of the evolutionary relationships of vertebrates.
Asunto(s)
Mamíferos/clasificación , Análisis de Secuencia de ADN , Animales , Mapeo Cromosómico , Cromosomas Humanos Par 7 , Secuencia Conservada , Humanos , Datos de Secuencia Molecular , Filogenia , Alineación de Secuencia , Especificidad de la EspecieRESUMEN
The transcription factor SOX10 is mutated in the human neurocristopathy Waardenburg-Shah syndrome (WS4), which is characterized by enteric aganglionosis and pigmentation defects. SOX10 directly regulates genes expressed in neural crest lineages, including the enteric ganglia and melanocytes. Although some SOX10 target genes have been reported, the mechanisms by which SOX10 expression is regulated remain elusive. Here, we describe a transgene-insertion mutant mouse line (Hry) that displays partial enteric aganglionosis, a loss of melanocytes, and decreased Sox10 expression in homozygous embryos. Mutation analysis of Sox10 coding sequences was negative, suggesting that non-coding regulatory sequences are disrupted. To isolate the Hry molecular defect, Sox10 genomic sequences were collected from multiple species, comparative sequence analysis was performed and software was designed (ExactPlus) to identify identical sequences shared among species. Mutation analysis of conserved sequences revealed a 15.9 kb deletion located 47.3 kb upstream of Sox10 in Hry mice. ExactPlus revealed three clusters of highly conserved sequences within the deletion, one of which shows strong enhancer potential in cultured melanocytes. These studies: (i) present a novel hypomorphic Sox10 mutation that results in a WS4-like phenotype in mice; (ii) demonstrate that a 15.9 kb deletion underlies the observed phenotype and likely removes sequences essential for Sox10 expression; (iii) combine a novel in silico method for comparative sequence analysis with in vitro functional assays to identify candidate regulatory sequences deleted in this strain. These studies will direct further analyses of Sox10 regulation and provide candidate sequences for mutation detection in WS4 patients lacking a SOX10-coding mutation.
Asunto(s)
Secuencia de Bases/genética , Embrión de Mamíferos/metabolismo , Regulación del Desarrollo de la Expresión Génica , Proteínas del Grupo de Alta Movilidad/metabolismo , Eliminación de Secuencia/genética , Factores de Transcripción/metabolismo , Síndrome de Waardenburg/genética , Algoritmos , Animales , Southern Blotting , Células Cultivadas , Secuencia Conservada/genética , Análisis Mutacional de ADN , Componentes del Gen , Proteínas del Grupo de Alta Movilidad/genética , Hibridación in Situ , Luciferasas , Ratones , Ratones Transgénicos , Datos de Secuencia Molecular , Factores de Transcripción SOXE , Análisis de Secuencia de ADN , Especificidad de la Especie , Factores de Transcripción/genética , Transgenes/genéticaRESUMEN
The identification of noncoding functional elements within vertebrate genomes, such as those that regulate gene expression, is a major challenge. Comparisons of orthologous sequences from multiple species are effective at detecting highly conserved regions and can reveal potential regulatory sequences. The GDF6 gene controls developmental patterning of skeletal joints and is associated with numerous, distant cis-acting regulatory elements. Using sequence data from 14 vertebrate species, we performed novel multispecies comparative analyses to detect highly conserved sequences flanking GDF6. The complementary tools WebMCS and ExactPlus identified a series of multispecies conserved sequences (MCSs). Of particular interest are MCSs within noncoding regions previously shown to contain GDF6 regulatory elements. A previously reported conserved sequence at -64 kb was also detected by both WebMCS and ExactPlus. Analysis of LacZ-reporter transgenic mice revealed that a 440-bp segment from this region contains an enhancer for Gdf6 expression in developing proximal limb joints. Several other MCSs represent candidate GDF6 regulatory elements; many of these are not conserved in fish or frog, but are strongly conserved in mammals.
Asunto(s)
Proteínas Morfogenéticas Óseas/genética , Elementos de Facilitación Genéticos , Articulaciones/metabolismo , Elementos de Respuesta , Animales , Secuencia de Bases , Gatos , Bovinos , Secuencia Conservada , ADN/genética , Expresión Génica , Factor 6 de Diferenciación de Crecimiento , Humanos , Ratones , Ratones Transgénicos , Datos de Secuencia Molecular , Regiones Promotoras Genéticas , RatasRESUMEN
Comparison is a fundamental tool for analyzing DNA sequence. Interspecies sequence comparison is particularly powerful for inferring genome function and is based on the simple premise that conserved sequences are likely to be important. Thus, the comparison of a genomic sequence with its orthologous counterpart from another species is increasingly becoming an integral component of genome analysis. In ideal situations, such comparisons are performed with orthologous sequences from multiple species. To facilitate multispecies comparative sequence analysis, a robust and scalable strategy for simultaneously constructing sequence-ready bacterial artificial chromosome (BAC) contig maps from targeted genomic regions has been developed. Central to this approach is the generation and utilization of "universal" oligonucleotide-based hybridization probes ("overgo" probes), which are designed from sequences that are highly conserved between distantly related species. Large collections of these probes are used en masse to screen BAC libraries from multiple species in parallel, with the isolated clones assembled into physical contig maps. To validate the effectiveness of this strategy, efforts were focused on the construction of BAC-based physical maps from multiple mammalian species (chimpanzee, baboon, cat, dog, cow, and pig). Using available human and mouse genomic sequence and a newly developed computer program to design the requisite probes, sequence-ready maps were constructed in all species for a series of targeted regions totaling approximately 16 Mb in the human genome. The described approach can be used to facilitate the multispecies comparative sequencing of targeted genomic regions and can be adapted for constructing BAC contig maps in other vertebrates.