Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 51(D1): D690-D699, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36263822

RESUMEN

The Comprehensive Antibiotic Resistance Database (CARD; card.mcmaster.ca) combines the Antibiotic Resistance Ontology (ARO) with curated AMR gene (ARG) sequences and resistance-conferring mutations to provide an informatics framework for annotation and interpretation of resistomes. As of version 3.2.4, CARD encompasses 6627 ontology terms, 5010 reference sequences, 1933 mutations, 3004 publications, and 5057 AMR detection models that can be used by the accompanying Resistance Gene Identifier (RGI) software to annotate genomic or metagenomic sequences. Focused curation enhancements since 2020 include expanded ß-lactamase curation, incorporation of likelihood-based AMR mutations for Mycobacterium tuberculosis, addition of disinfectants and antiseptics plus their associated ARGs, and systematic curation of resistance-modifying agents. This expanded curation includes 180 new AMR gene families, 15 new drug classes, 1 new resistance mechanism, and two new ontological relationships: evolutionary_variant_of and is_small_molecule_inhibitor. In silico prediction of resistomes and prevalence statistics of ARGs has been expanded to 377 pathogens, 21,079 chromosomes, 2,662 genomic islands, 41,828 plasmids and 155,606 whole-genome shotgun assemblies, resulting in collation of 322,710 unique ARG allele sequences. New features include the CARD:Live collection of community submitted isolate resistome data and the introduction of standardized 15 character CARD Short Names for ARGs to support machine learning efforts.


Asunto(s)
Curaduría de Datos , Bases de Datos Factuales , Farmacorresistencia Microbiana , Aprendizaje Automático , Antibacterianos/farmacología , Genes Bacterianos , Funciones de Verosimilitud , Programas Informáticos , Anotación de Secuencia Molecular
2.
Nucleic Acids Res ; 48(D1): D517-D525, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31665441

RESUMEN

The Comprehensive Antibiotic Resistance Database (CARD; https://card.mcmaster.ca) is a curated resource providing reference DNA and protein sequences, detection models and bioinformatics tools on the molecular basis of bacterial antimicrobial resistance (AMR). CARD focuses on providing high-quality reference data and molecular sequences within a controlled vocabulary, the Antibiotic Resistance Ontology (ARO), designed by the CARD biocuration team to integrate with software development efforts for resistome analysis and prediction, such as CARD's Resistance Gene Identifier (RGI) software. Since 2017, CARD has expanded through extensive curation of reference sequences, revision of the ontological structure, curation of over 500 new AMR detection models, development of a new classification paradigm and expansion of analytical tools. Most notably, a new Resistomes & Variants module provides analysis and statistical summary of in silico predicted resistance variants from 82 pathogens and over 100 000 genomes. By adding these resistance variants to CARD, we are able to summarize predicted resistance using the information included in CARD, identify trends in AMR mobility and determine previously undescribed and novel resistance variants. Here, we describe updates and recent expansions to CARD and its biocuration process, including new resources for community biocuration of AMR molecular reference data.


Asunto(s)
Bases de Datos Genéticas , Farmacorresistencia Bacteriana , Genes Bacterianos , Programas Informáticos , Bacterias/efectos de los fármacos , Bacterias/genética , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo
3.
Bioinformatics ; 32(8): 1275-7, 2016 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-26656932

RESUMEN

MOTIVATION: There are various reasons for rerunning bioinformatics tools and pipelines on sequencing data, including reproducing a past result, validation of a new tool or workflow using a known dataset, or tracking the impact of database changes. For identical results to be achieved, regularly updated reference sequence databases must be versioned and archived. Database administrators have tried to fill the requirements by supplying users with one-off versions of databases, but these are time consuming to set up and are inconsistent across resources. Disk storage and data backup performance has also discouraged maintaining multiple versions of databases since databases such as NCBI nr can consume 50 Gb or more disk space per version, with growth rates that parallel Moore's law. RESULTS: Our end-to-end solution combines our own Kipper software package-a simple key-value large file versioning system-with BioMAJ (software for downloading sequence databases), and Galaxy (a web-based bioinformatics data processing platform). Available versions of databases can be recalled and used by command-line and Galaxy users. The Kipper data store format makes publishing curated FASTA databases convenient since in most cases it can store a range of versions into a file marginally larger than the size of the latest version. AVAILABILITY AND IMPLEMENTATION: Kipper v1.0.0 and the Galaxy Versioned Data tool are written in Python and released as free and open source software available at https://github.com/Public-Health-Bioinformatics/kipper and https://github.com/Public-Health-Bioinformatics/versioned_data, respectively; detailed setup instructions can be found at https://github.com/Public-Health-Bioinformatics/versioned_data/blob/master/doc/setup.md CONTACT: : Damion.Dooley@Bccdc.Ca or William.Hsiao@Bccdc.CaSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Bases de Datos de Ácidos Nucleicos , Programas Informáticos , Interfaz Usuario-Computador
4.
Appl Environ Microbiol ; 81(14): 4827-34, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25956776

RESUMEN

Giardia is the most common parasitic cause of gastrointestinal infections worldwide, with transmission through surface water playing an important role in various parts of the world. Giardia duodenalis (synonyms: G. intestinalis and G. lamblia), a multispecies complex, has two zoonotic subtypes, assemblages A and B. When British Columbia (BC), a western Canadian province, experienced several waterborne giardiasis outbreaks due to unfiltered surface drinking water in the late 1980s, collection of isolates from surface water, as well as from humans and beavers (Castor canadensis), throughout the province was carried out. To better understand Giardia in surface water, 71 isolates, including 29 from raw surface water samples, 29 from human giardiasis cases, and 13 from beavers in watersheds from this historical library were characterized by PCR. Study isolates also included isolates from waterborne giardiasis outbreaks. Both assemblages A and B were identified in surface water, human, and beavers samples, including a mixture of both assemblages A and B in waterborne outbreaks. PCR results were confirmed by whole-genome sequencing (WGS) for one waterborne outbreak and supported the clustering of human, water, and beaver isolates within both assemblages. We concluded that contamination of surface water by Giardia is complex, that the majority of our surface water isolates were assemblage B, and that both assemblages A and B may cause waterborne outbreaks. The higher-resolution data provided by WGS warrants further study to better understand the spread of Giardia.


Asunto(s)
Agua Dulce/parasitología , Giardia lamblia/clasificación , Giardia lamblia/aislamiento & purificación , Colombia Británica , Genoma de Protozoos , Genotipo , Giardia lamblia/genética , Giardiasis/parasitología , Humanos , Datos de Secuencia Molecular , Filogenia , Reacción en Cadena de la Polimerasa
5.
Bioinformatics ; 29(8): 1004-10, 2013 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-23457040

RESUMEN

MOTIVATION: High-accuracy de novo assembly of the short sequencing reads from RNA-Seq technology is very challenging. We introduce a de novo assembly algorithm, EBARDenovo, which stands for Extension, Bridging And Repeat-sensing Denovo. This algorithm uses an efficient chimera-detection function to abrogate the effect of aberrant chimeric reads in RNA-Seq data. RESULTS: EBARDenovo resolves the complications of RNA-Seq assembly arising from sequencing errors, repetitive sequences and aberrant chimeric amplicons. In a series of assembly experiments, our algorithm is the most accurate among the examined programs, including de Bruijn graph assemblers, Trinity and Oases. AVAILABILITY AND IMPLEMENTATION: EBARDenovo is available at http://ebardenovo.sourceforge.net/. This software package (with patent pending) is free of charge for academic use only. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , ARN/química , Secuencias Repetitivas de Ácidos Nucleicos , Programas Informáticos
6.
Microb Genom ; 10(6)2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38860884

RESUMEN

As public health laboratories expand their genomic sequencing and bioinformatics capacity for the surveillance of different pathogens, labs must carry out robust validation, training, and optimization of wet- and dry-lab procedures. Achieving these goals for algorithms, pipelines and instruments often requires that lower quality datasets be made available for analysis and comparison alongside those of higher quality. This range of data quality in reference sets can complicate the sharing of sub-optimal datasets that are vital for the community and for the reproducibility of assays. Sharing of useful, but sub-optimal datasets requires careful annotation and documentation of known issues to enable appropriate interpretation, avoid being mistaken for better quality information, and for these data (and their derivatives) to be easily identifiable in repositories. Unfortunately, there are currently no standardized attributes or mechanisms for tagging poor-quality datasets, or datasets generated for a specific purpose, to maximize their utility, searchability, accessibility and reuse. The Public Health Alliance for Genomic Epidemiology (PHA4GE) is an international community of scientists from public health, industry and academia focused on improving the reproducibility, interoperability, portability, and openness of public health bioinformatic software, skills, tools and data. To address the challenges of sharing lower quality datasets, PHA4GE has developed a set of standardized contextual data tags, namely fields and terms, that can be included in public repository submissions as a means of flagging pathogen sequence data with known quality issues, increasing their discoverability. The contextual data tags were developed through consultations with the community including input from the International Nucleotide Sequence Data Collaboration (INSDC), and have been standardized using ontologies - community-based resources for defining the tag properties and the relationships between them. The standardized tags are agnostic to the organism and the sequencing technique used and thus can be applied to data generated from any pathogen using an array of sequencing techniques. The tags can also be applied to synthetic (lab created) data. The list of standardized tags is maintained by PHA4GE and can be found at https://github.com/pha4ge/contextual_data_QC_tags. Definitions, ontology IDs, examples of use, as well as a JSON representation, are provided. The PHA4GE QC tags were tested, and are now implemented, by the FDA's GenomeTrakr laboratory network as part of its routine submission process for SARS-CoV-2 wastewater surveillance. We hope that these simple, standardized tags will help improve communication regarding quality control in public repositories, in addition to making datasets of variable quality more easily identifiable. Suggestions for additional tags can be submitted to PHA4GE via the New Term Request Form in the GitHub repository. By providing a mechanism for feedback and suggestions, we also expect that the tags will evolve with the needs of the community.


Asunto(s)
Biología Computacional , Salud Pública , Control de Calidad , Humanos , Biología Computacional/métodos , Difusión de la Información/métodos , Reproducibilidad de los Resultados , Anotación de Secuencia Molecular/métodos , Genómica/métodos , Programas Informáticos
7.
Bioinformatics ; 28(14): 1947-8, 2012 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-22576174

RESUMEN

UNLABELLED: Analysis of microbial genomes often requires the general organization and comparison of tens to thousands of genomes both from public repositories and unpublished sources. MicrobeDB provides a foundation for such projects by the automation of downloading published, completed bacterial and archaeal genomes from key sources, parsing annotations of all genomes (both public and private) into a local database, and allowing interaction with the database through an easy to use programming interface. MicrobeDB creates a simple to use, easy to maintain, centralized local resource for various large-scale comparative genomic analyses and a back-end for future microbial application design. AVAILABILITY: MicrobeDB is freely available under the GNU-GPL at: http://github.com/mlangill/microbedb/


Asunto(s)
Bases de Datos Genéticas , Genoma Arqueal , Genoma Bacteriano , Biología Computacional/métodos , Genómica , Internet , Programas Informáticos , Interfaz Usuario-Computador
8.
PLoS One ; 18(6): e0286728, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37267413

RESUMEN

An application ontology often reuses terms from other related, compatible ontologies. The extent of this interconnectedness is not readily apparent when browsing through larger textual presentations of term class hierarchies, be it Manchester text format OWL files or within an ontology editor like Protege. Users must either note ontology sources in term identifiers, or look at ontology import file term origins. Diagrammatically, this same information may be easier to perceive in 2 dimensional network or hierarchical graphs that visually code ontology term origins. However, humans, having stereoscopic vision and navigational acuity around colored and textured shapes, should benefit even more from a coherent 3-dimensional interactive visualization of ontology that takes advantage of perspective to offer both foreground focus on content and a stable background context. We present OntoTrek, a 3D ontology visualizer that enables ontology stakeholders-students, software developers, curation teams, and funders-to recognize the presence of imported terms and their domains, ultimately illustrating how projects can capture knowledge through a vocabulary of interwoven community-supported ontology resources.


Asunto(s)
Imagenología Tridimensional , Programas Informáticos , Humanos
9.
BMJ Open ; 13(2): e066418, 2023 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-36750286

RESUMEN

OBJECTIVES: COVID-19 research has significantly contributed to pandemic response and the enhancement of public health capacity. COVID-19 data collected by provincial/territorial health authorities in Canada are valuable for research advancement yet not readily available to the public, including researchers. To inform developments in public health data-sharing in Canada, we explored Canadians' opinions of public health authorities sharing deidentified individual-level COVID-19 data publicly. DESIGN/SETTING/INTERVENTIONS/OUTCOMES: A national cross-sectional survey was administered in Canada in March 2022, assessing Canadians' opinions on publicly sharing COVID-19 datatypes. Market research firm Léger was employed for recruitment and data collection. PARTICIPANTS: Anyone greater than or equal to 18 years and currently living in Canada. RESULTS: 4981 participants completed the survey with a 92.3% response rate. 79.7% were supportive of provincial/territorial authorities publicly sharing deidentified COVID-19 data, while 20.3% were hesitant/averse/unsure. Datatypes most supported for being shared publicly were symptoms (83.0% in support), geographical region (82.6%) and COVID-19 vaccination status (81.7%). Datatypes with the most aversion were employment sector (27.4% averse), postal area (26.7%) and international travel history (19.7%). Generally supportive Canadians were characterised as being ≥50 years, with higher education, and being vaccinated against COVID-19 at least once. Vaccination status was the most influential predictor of data-sharing opinion, with respondents who were ever vaccinated being 4.20 times more likely (95% CI 3.21 to 5.48, p=0.000) to be generally supportive of data-sharing than those unvaccinated. CONCLUSIONS: These findings suggest that the Canadian public is generally favourable to deidentified data-sharing. Identifying factors that are likely to improve attitudes towards data-sharing are useful to stakeholders involved in data-sharing initiatives, such as public health agencies, in informing the development of public health communication and data-sharing policies. As Canada progresses through the COVID-19 pandemic, and with limited testing and reporting of COVID-19 data, it is essential to improve deidentified data-sharing given the public's general support for these efforts.


Asunto(s)
COVID-19 , Humanos , Estudios Transversales , Opinión Pública , Pandemias , Vacunas contra la COVID-19 , Canadá
10.
Microb Genom ; 9(1)2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36748616

RESUMEN

Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool's web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.


Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , Pandemias , SARS-CoV-2/genética , Canadá , Genómica/métodos
11.
BMC Genomics ; 13 Suppl 7: S5, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23282223

RESUMEN

BACKGROUND: Mitochondrial dysfunction is associated with various aging diseases. The copy number of mtDNA in human cells may therefore be a potential biomarker for diagnostics of aging. Here we propose a new computational method for the accurate assessment of mtDNA copies from whole genome sequencing data. RESULTS: Two families of the human whole genome sequencing datasets from the HapMap and the 1000 Genomes projects were used for the accurate counting of mitochondrial DNA copy numbers. The results revealed the parental mitochondrial DNA copy numbers are significantly lower than that of their children in these samples. There are 8%~21% more copies of mtDNA in samples from the children than from their parents. The experiment demonstrated the possible correlations between the quantity of mitochondrial DNA and aging-related diseases. CONCLUSIONS: Since the next-generation sequencing technology strives to deliver affordable and non-biased sequencing results, accurate assessment of mtDNA copy numbers can be achieved effectively from the output of whole genome sequencing. We implemented the method as a software package MitoCounter with the source code and user's guide available to the public at http://sourceforge.net/projects/mitocounter/.


Asunto(s)
ADN Mitocondrial/metabolismo , Genoma Humano , Mitocondrias/genética , Adulto , Niño , Bases de Datos Genéticas , Femenino , Humanos , Masculino , Análisis de Secuencia de ADN , Programas Informáticos
12.
Microb Genom ; 8(12)2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36748524

RESUMEN

The White-Kauffmann-Le Minor (WKL) scheme is the most widely used Salmonella typing scheme for reporting the disease prevalence of the enteric pathogen. With the advent of whole-genome sequencing (WGS), in silico methods have increasingly replaced traditional serotyping due to reproducibility, speed and coverage. However, despite integrating genomic-based typing by in silico serotyping tools such as SISTR, in silico serotyping in certain contexts remains ambiguous and insufficiently informative. Specifically, in silico serotyping does not attempt to resolve polyphyly. Furthermore, in spite of the widespread acknowledgement of polyphyly from genomic studies, the prevalence of polyphyletic serovars is not well characterized. Here, we applied a genomics approach to acquire the necessary resolution to classify genetically discordant serovars and propose an alternative typing scheme that consistently reflect natural Salmonella populations. By accessing the unprecedented volume of bacterial genomic data publicly available in GenomeTrakr and PubMLST databases (>180 000 genomes representing 723 serovars), we characterized the global Salmonella population structure and systematically identified putative non-monophyletic serovars. The proportion of putative non-monophyletic serovars was estimated higher than previous reports, reinforcing the inability of antigenic determinants to depict the complexity of Salmonella evolutionary history. We explored the extent of genetic diversity masked by serotyping labels and found significant intra-serovar molecular differences across many clinically important serovars. To avoid false discovery due to incorrect in silico serotyping calls, we cross-referenced reported serovar labels and concluded a low error rate in in silico serotyping. The combined application of clustering statistics and genome-wide association methods demonstrated effective characterization of stable bacterial populations and explained functional differences. The collective methods adopted in our study have practical values in establishing genomic-based typing nomenclatures for an entire microbial species or closely related subpopulations. Ultimately, we foresee an improved typing scheme to be a hybrid that integrates both genomic and antigenic information such that the resolution from WGS is leveraged to improve the precision of subpopulation classification while preserving the common names defined by the WKL scheme.


Asunto(s)
Salmonella enterica , Salmonella enterica/genética , Reproducibilidad de los Resultados , Estudio de Asociación del Genoma Completo , Salmonella/genética , Genómica
13.
Gigascience ; 112022 02 16.
Artículo en Inglés | MEDLINE | ID: mdl-35169842

RESUMEN

BACKGROUND: The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. RESULTS: As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. CONCLUSIONS: Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI's BioSample database.


Asunto(s)
COVID-19 , SARS-CoV-2 , Genómica , Humanos , Metadatos , Salud Pública , Reproducibilidad de los Resultados
14.
Front Genet ; 12: 716541, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35401651

RESUMEN

COVID-19 was declared to be a pandemic in March 2020 by the World Health Organization. Timely sharing of viral genomic sequencing data accompanied by a minimal set of contextual data is essential for informing regional, national, and international public health responses. Such contextual data is also necessary for developing, and improving clinical therapies and vaccines, and enhancing the scientific community's understanding of the SARS-CoV-2 virus. The Canadian COVID-19 Genomics Network (CanCOGeN) was launched in April 2020 to coordinate and upscale existing genomics-based COVID-19 research and surveillance efforts. CanCOGeN is performing large-scale sequencing of both the genomes of SARS-CoV-2 virus samples (VirusSeq) and affected Canadians (HostSeq). This paper addresses the privacy concerns associated with sharing the viral sequence data with a pre-defined set of contextual data describing the sample source and case attribute of the sequence data in the Canadian context. Currently, the viral genome sequences are shared by provincial public health laboratories and their healthcare and academic partners, with the Canadian National Microbiology Laboratory and with publicly accessible databases. However, data sharing delays and the provision of incomplete contextual data often occur because publicly releasing such data triggers privacy and data governance concerns. The CanCOGeN Ethics and Governance Expert Working Group thus has investigated several privacy issues cited by CanCOGeN data providers/stewards. This paper addresses these privacy concerns and offers insights primarily in the Canadian context, although similar privacy considerations also exist in other jurisdictions. We maintain that sharing viral sequencing data and its limited associated contextual data in the public domain generally does not pose insurmountable privacy challenges. However, privacy risks associated with reidentification should be actively monitored due to advancements in reidentification methods and the evolving pandemic landscape. We also argue that during a global health emergency such as COVID-19, privacy should not be used as a blanket measure to prevent such genomic data sharing due to the significant benefits it provides towards public health responses and ongoing research activities.

15.
BMC Bioinformatics ; 9: 329, 2008 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-18680607

RESUMEN

BACKGROUND: Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that utilize sequence composition characteristics, such as G+C ratio and dinucleotide bias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs be constructed using criteria that are independent of sequence composition-based analysis approaches. RESULTS: We developed a comparative genomics approach (IslandPick) that identifies both very probable islands and non-island regions. The approach involves 1) flexible, automated selection of comparative genomes for each query genome, using a distance function that picks appropriate genomes for identification of GIs, 2) identification of regions unique to the query genome, compared with the chosen genomes (positive dataset) and 3) identification of regions conserved across all genomes (negative dataset). Using our constructed datasets, we investigated the accuracy of several sequence composition-based GI prediction tools. CONCLUSION: Our results indicate that AlienHunter has the highest recall, but the lowest measured precision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB have comparable overall highest accuracy. Our comparative genomics approach, IslandPick, was the most accurate, compared with a curated list of GIs, indicating that we have constructed suitable datasets. This represents the first evaluation, using diverse and, independent datasets that were not artificially constructed, of the accuracy of several sequence composition-based GI predictors. The caveats associated with this analysis and proposals for optimal island prediction are discussed.


Asunto(s)
Algoritmos , Islas Genómicas/genética , Genómica/métodos , Composición de Base , Secuencia de Bases , Programas Informáticos
16.
PLoS Genet ; 1(5): e62, 2005 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-16299586

RESUMEN

Microbial genes that are "novel" (no detectable homologs in other species) have become of increasing interest as environmental sampling suggests that there are many more such novel genes in yet-to-be-cultured microorganisms. By analyzing known microbial genomic islands and prophages, we developed criteria for systematic identification of putative genomic islands (clusters of genes of probable horizontal origin in a prokaryotic genome) in 63 prokaryotic genomes, and then characterized the distribution of novel genes and other features. All but a few of the genomes examined contained significantly higher proportions of novel genes in their predicted genomic islands compared with the rest of their genome (Paired t test = 4.43E-14 to 1.27E-18, depending on method). Moreover, the reverse observation (i.e., higher proportions of novel genes outside of islands) never reached statistical significance in any organism examined. We show that this higher proportion of novel genes in predicted genomic islands is not due to less accurate gene prediction in genomic island regions, but likely reflects a genuine increase in novel genes in these regions for both bacteria and archaea. This represents the first comprehensive analysis of novel genes in prokaryotic genomic islands and provides clues regarding the origin of novel genes. Our collective results imply that there are different gene pools associated with recently horizontally transmitted genomic regions versus regions that are primarily vertically inherited. Moreover, there are more novel genes within the gene pool associated with genomic islands. Since genomic islands are frequently associated with a particular microbial adaptation, such as antibiotic resistance, pathogen virulence, or metal resistance, this suggests that microbes may have access to a larger "arsenal" of novel genes for adaptation than previously thought.


Asunto(s)
Genoma Arqueal , Genoma Bacteriano , Islas Genómicas , Bacteriófagos , Genes Arqueales , Genes Bacterianos , Modelos Genéticos , Modelos Estadísticos , Sistemas de Lectura Abierta , Alineación de Secuencia
17.
NPJ Sci Food ; 2: 23, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-31304272

RESUMEN

The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.

18.
Am J Infect Control ; 45(2): 170-179, 2017 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-28159067

RESUMEN

With the growing importance of infectious diseases in health care and communicable disease outbreaks garnering increasing attention, new technologies are playing a greater role in helping us prevent health care-associated infections and provide optimal public health. The microbiology laboratory has always played a large role in infection control by providing tools to identify, characterize, and track pathogens. Recently, advances in DNA sequencing technology have ushered in a new era of genomic epidemiology, where traditional molecular diagnostics and genotyping methods are being enhanced and even replaced by genomics-based methods to aid epidemiologic investigations of communicable diseases. The ability to analyze and compare entire pathogen genomes has allowed for unprecedented resolution into how and why infectious diseases spread. As these genomics-based methods continue to improve in speed, cost, and accuracy, they will be increasingly used to inform and guide infection control and public health practices.


Asunto(s)
Infección Hospitalaria/diagnóstico , Infección Hospitalaria/prevención & control , Control de Infecciones/métodos , Epidemiología Molecular/métodos , Técnicas de Genotipaje/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos
19.
Front Microbiol ; 8: 1068, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28694792

RESUMEN

Globalization of food networks increases opportunities for the spread of foodborne pathogens beyond borders and jurisdictions. High resolution whole-genome sequencing (WGS) subtyping of pathogens promises to vastly improve our ability to track and control foodborne disease, but to do so it must be combined with epidemiological, clinical, laboratory and other health care data (called "contextual data") to be meaningfully interpreted for regulatory and health interventions, outbreak investigation, and risk assessment. However, current multi-jurisdictional pathogen surveillance and investigation efforts are complicated by time-consuming data re-entry, curation and integration of contextual information owing to a lack of interoperable standards and inconsistent reporting. A solution to these challenges is the use of 'ontologies' - hierarchies of well-defined and standardized vocabularies interconnected by logical relationships. Terms are specified by universal IDs enabling integration into highly regulated areas and multi-sector sharing (e.g., food and water microbiology with the veterinary sector). Institution-specific terms can be mapped to a given standard at different levels of granularity, maximizing comparability of contextual information according to jurisdictional policies. Fit-for-purpose ontologies provide contextual information with the auditability required for food safety laboratory accreditation. Our research efforts include the development of a Genomic Epidemiology Ontology (GenEpiO), and Food Ontology (FoodOn) that harmonize important laboratory, clinical and epidemiological data fields, as well as existing food resources. These efforts are supported by a global consortium of researchers and stakeholders worldwide. Since foodborne diseases do not respect international borders, uptake of such vocabularies will be crucial for multi-jurisdictional interpretation of WGS results and data sharing.

20.
J Mol Biol ; 348(4): 817-30, 2005 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-15843015

RESUMEN

Salmonella enterica serovar Typhimurium is lysogenized by several temperate bacteriophages that encode lysogenic conversion genes, which can act as virulence factors during infection and contribute to the genetic diversity and pathogenic potential of the lysogen. We have investigated the temperate bacteriophage called Gifsy-1 in S.enterica serovar Typhimurium and show here that the product of the gogB gene encoded within this phage shares similarity with proteins from other Gram-negative pathogens. The amino-terminal portion of GogB shares similarity with leucine-rich repeat-containing virulence-associated proteins from other Gram-negative pathogens, whereas the carboxyl-terminal portion of GogB shares similarity with uncharacterized proteins in other pathogens. We show that GogB is secreted by both type III secretion systems encoded in Salmonella Pathogenicity Island-1 (SPI-1) and SPI-2 but translocation into host cells is a SPI-2-mediated process. Once translocated, GogB localizes to the cytoplasm of infected host cells. The genetic regulation of gogB in Salmonella is influenced by the transcriptional activator, SsrB, under SPI-2-inducing conditions, but the modular nature of the gogB gene allows for autonomous expression and type III secretion following horizontal gene transfer into a heterologous pathogen. These data define the first autonomously expressed lysogenic conversion gene within Gifsy-1 that acts as a modular and promiscuous type III-secreted substrate of the infection process.


Asunto(s)
Regulación Viral de la Expresión Génica , Fagos de Salmonella/genética , Fagos de Salmonella/metabolismo , Salmonella typhimurium/metabolismo , Salmonella typhimurium/virología , Proteínas Virales/genética , Proteínas Virales/metabolismo , Secuencia de Aminoácidos , Células HeLa , Humanos , Datos de Secuencia Molecular , Mutación/genética , Profagos/genética , Profagos/metabolismo , Profagos/patogenicidad , Transporte de Proteínas , Fagos de Salmonella/patogenicidad , Salmonella typhimurium/química , Salmonella typhimurium/genética , Alineación de Secuencia , Especificidad por Sustrato , Proteínas Virales/química , Factores de Virulencia/química , Factores de Virulencia/genética , Factores de Virulencia/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA