Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
PLoS One ; 18(6): e0286728, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37267413

RESUMEN

An application ontology often reuses terms from other related, compatible ontologies. The extent of this interconnectedness is not readily apparent when browsing through larger textual presentations of term class hierarchies, be it Manchester text format OWL files or within an ontology editor like Protege. Users must either note ontology sources in term identifiers, or look at ontology import file term origins. Diagrammatically, this same information may be easier to perceive in 2 dimensional network or hierarchical graphs that visually code ontology term origins. However, humans, having stereoscopic vision and navigational acuity around colored and textured shapes, should benefit even more from a coherent 3-dimensional interactive visualization of ontology that takes advantage of perspective to offer both foreground focus on content and a stable background context. We present OntoTrek, a 3D ontology visualizer that enables ontology stakeholders-students, software developers, curation teams, and funders-to recognize the presence of imported terms and their domains, ultimately illustrating how projects can capture knowledge through a vocabulary of interwoven community-supported ontology resources.


Asunto(s)
Imagenología Tridimensional , Programas Informáticos , Humanos
2.
Microb Genom ; 9(1)2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36748616

RESUMEN

Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool's web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.


Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , Pandemias , SARS-CoV-2/genética , Canadá , Genómica/métodos
3.
BMJ Open ; 13(2): e066418, 2023 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-36750286

RESUMEN

OBJECTIVES: COVID-19 research has significantly contributed to pandemic response and the enhancement of public health capacity. COVID-19 data collected by provincial/territorial health authorities in Canada are valuable for research advancement yet not readily available to the public, including researchers. To inform developments in public health data-sharing in Canada, we explored Canadians' opinions of public health authorities sharing deidentified individual-level COVID-19 data publicly. DESIGN/SETTING/INTERVENTIONS/OUTCOMES: A national cross-sectional survey was administered in Canada in March 2022, assessing Canadians' opinions on publicly sharing COVID-19 datatypes. Market research firm Léger was employed for recruitment and data collection. PARTICIPANTS: Anyone greater than or equal to 18 years and currently living in Canada. RESULTS: 4981 participants completed the survey with a 92.3% response rate. 79.7% were supportive of provincial/territorial authorities publicly sharing deidentified COVID-19 data, while 20.3% were hesitant/averse/unsure. Datatypes most supported for being shared publicly were symptoms (83.0% in support), geographical region (82.6%) and COVID-19 vaccination status (81.7%). Datatypes with the most aversion were employment sector (27.4% averse), postal area (26.7%) and international travel history (19.7%). Generally supportive Canadians were characterised as being ≥50 years, with higher education, and being vaccinated against COVID-19 at least once. Vaccination status was the most influential predictor of data-sharing opinion, with respondents who were ever vaccinated being 4.20 times more likely (95% CI 3.21 to 5.48, p=0.000) to be generally supportive of data-sharing than those unvaccinated. CONCLUSIONS: These findings suggest that the Canadian public is generally favourable to deidentified data-sharing. Identifying factors that are likely to improve attitudes towards data-sharing are useful to stakeholders involved in data-sharing initiatives, such as public health agencies, in informing the development of public health communication and data-sharing policies. As Canada progresses through the COVID-19 pandemic, and with limited testing and reporting of COVID-19 data, it is essential to improve deidentified data-sharing given the public's general support for these efforts.


Asunto(s)
COVID-19 , Humanos , Estudios Transversales , Opinión Pública , Pandemias , Vacunas contra la COVID-19 , Canadá
4.
Nucleic Acids Res ; 51(D1): D690-D699, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36263822

RESUMEN

The Comprehensive Antibiotic Resistance Database (CARD; card.mcmaster.ca) combines the Antibiotic Resistance Ontology (ARO) with curated AMR gene (ARG) sequences and resistance-conferring mutations to provide an informatics framework for annotation and interpretation of resistomes. As of version 3.2.4, CARD encompasses 6627 ontology terms, 5010 reference sequences, 1933 mutations, 3004 publications, and 5057 AMR detection models that can be used by the accompanying Resistance Gene Identifier (RGI) software to annotate genomic or metagenomic sequences. Focused curation enhancements since 2020 include expanded ß-lactamase curation, incorporation of likelihood-based AMR mutations for Mycobacterium tuberculosis, addition of disinfectants and antiseptics plus their associated ARGs, and systematic curation of resistance-modifying agents. This expanded curation includes 180 new AMR gene families, 15 new drug classes, 1 new resistance mechanism, and two new ontological relationships: evolutionary_variant_of and is_small_molecule_inhibitor. In silico prediction of resistomes and prevalence statistics of ARGs has been expanded to 377 pathogens, 21,079 chromosomes, 2,662 genomic islands, 41,828 plasmids and 155,606 whole-genome shotgun assemblies, resulting in collation of 322,710 unique ARG allele sequences. New features include the CARD:Live collection of community submitted isolate resistome data and the introduction of standardized 15 character CARD Short Names for ARGs to support machine learning efforts.


Asunto(s)
Curaduría de Datos , Bases de Datos Factuales , Farmacorresistencia Microbiana , Aprendizaje Automático , Antibacterianos/farmacología , Genes Bacterianos , Funciones de Verosimilitud , Programas Informáticos , Anotación de Secuencia Molecular
5.
Gigascience ; 112022 02 16.
Artículo en Inglés | MEDLINE | ID: mdl-35169842

RESUMEN

BACKGROUND: The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. RESULTS: As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. CONCLUSIONS: Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI's BioSample database.


Asunto(s)
COVID-19 , SARS-CoV-2 , Genómica , Humanos , Metadatos , Salud Pública , Reproducibilidad de los Resultados
6.
Microb Genom ; 8(12)2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36748524

RESUMEN

The White-Kauffmann-Le Minor (WKL) scheme is the most widely used Salmonella typing scheme for reporting the disease prevalence of the enteric pathogen. With the advent of whole-genome sequencing (WGS), in silico methods have increasingly replaced traditional serotyping due to reproducibility, speed and coverage. However, despite integrating genomic-based typing by in silico serotyping tools such as SISTR, in silico serotyping in certain contexts remains ambiguous and insufficiently informative. Specifically, in silico serotyping does not attempt to resolve polyphyly. Furthermore, in spite of the widespread acknowledgement of polyphyly from genomic studies, the prevalence of polyphyletic serovars is not well characterized. Here, we applied a genomics approach to acquire the necessary resolution to classify genetically discordant serovars and propose an alternative typing scheme that consistently reflect natural Salmonella populations. By accessing the unprecedented volume of bacterial genomic data publicly available in GenomeTrakr and PubMLST databases (>180 000 genomes representing 723 serovars), we characterized the global Salmonella population structure and systematically identified putative non-monophyletic serovars. The proportion of putative non-monophyletic serovars was estimated higher than previous reports, reinforcing the inability of antigenic determinants to depict the complexity of Salmonella evolutionary history. We explored the extent of genetic diversity masked by serotyping labels and found significant intra-serovar molecular differences across many clinically important serovars. To avoid false discovery due to incorrect in silico serotyping calls, we cross-referenced reported serovar labels and concluded a low error rate in in silico serotyping. The combined application of clustering statistics and genome-wide association methods demonstrated effective characterization of stable bacterial populations and explained functional differences. The collective methods adopted in our study have practical values in establishing genomic-based typing nomenclatures for an entire microbial species or closely related subpopulations. Ultimately, we foresee an improved typing scheme to be a hybrid that integrates both genomic and antigenic information such that the resolution from WGS is leveraged to improve the precision of subpopulation classification while preserving the common names defined by the WKL scheme.


Asunto(s)
Salmonella enterica , Salmonella enterica/genética , Reproducibilidad de los Resultados , Estudio de Asociación del Genoma Completo , Salmonella/genética , Genómica
7.
Front Genet ; 12: 716541, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35401651

RESUMEN

COVID-19 was declared to be a pandemic in March 2020 by the World Health Organization. Timely sharing of viral genomic sequencing data accompanied by a minimal set of contextual data is essential for informing regional, national, and international public health responses. Such contextual data is also necessary for developing, and improving clinical therapies and vaccines, and enhancing the scientific community's understanding of the SARS-CoV-2 virus. The Canadian COVID-19 Genomics Network (CanCOGeN) was launched in April 2020 to coordinate and upscale existing genomics-based COVID-19 research and surveillance efforts. CanCOGeN is performing large-scale sequencing of both the genomes of SARS-CoV-2 virus samples (VirusSeq) and affected Canadians (HostSeq). This paper addresses the privacy concerns associated with sharing the viral sequence data with a pre-defined set of contextual data describing the sample source and case attribute of the sequence data in the Canadian context. Currently, the viral genome sequences are shared by provincial public health laboratories and their healthcare and academic partners, with the Canadian National Microbiology Laboratory and with publicly accessible databases. However, data sharing delays and the provision of incomplete contextual data often occur because publicly releasing such data triggers privacy and data governance concerns. The CanCOGeN Ethics and Governance Expert Working Group thus has investigated several privacy issues cited by CanCOGeN data providers/stewards. This paper addresses these privacy concerns and offers insights primarily in the Canadian context, although similar privacy considerations also exist in other jurisdictions. We maintain that sharing viral sequencing data and its limited associated contextual data in the public domain generally does not pose insurmountable privacy challenges. However, privacy risks associated with reidentification should be actively monitored due to advancements in reidentification methods and the evolving pandemic landscape. We also argue that during a global health emergency such as COVID-19, privacy should not be used as a blanket measure to prevent such genomic data sharing due to the significant benefits it provides towards public health responses and ongoing research activities.

8.
Nucleic Acids Res ; 48(D1): D517-D525, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31665441

RESUMEN

The Comprehensive Antibiotic Resistance Database (CARD; https://card.mcmaster.ca) is a curated resource providing reference DNA and protein sequences, detection models and bioinformatics tools on the molecular basis of bacterial antimicrobial resistance (AMR). CARD focuses on providing high-quality reference data and molecular sequences within a controlled vocabulary, the Antibiotic Resistance Ontology (ARO), designed by the CARD biocuration team to integrate with software development efforts for resistome analysis and prediction, such as CARD's Resistance Gene Identifier (RGI) software. Since 2017, CARD has expanded through extensive curation of reference sequences, revision of the ontological structure, curation of over 500 new AMR detection models, development of a new classification paradigm and expansion of analytical tools. Most notably, a new Resistomes & Variants module provides analysis and statistical summary of in silico predicted resistance variants from 82 pathogens and over 100 000 genomes. By adding these resistance variants to CARD, we are able to summarize predicted resistance using the information included in CARD, identify trends in AMR mobility and determine previously undescribed and novel resistance variants. Here, we describe updates and recent expansions to CARD and its biocuration process, including new resources for community biocuration of AMR molecular reference data.


Asunto(s)
Bases de Datos Genéticas , Farmacorresistencia Bacteriana , Genes Bacterianos , Programas Informáticos , Bacterias/efectos de los fármacos , Bacterias/genética , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo
9.
NPJ Sci Food ; 2: 23, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-31304272

RESUMEN

The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.

10.
Front Microbiol ; 8: 1068, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28694792

RESUMEN

Globalization of food networks increases opportunities for the spread of foodborne pathogens beyond borders and jurisdictions. High resolution whole-genome sequencing (WGS) subtyping of pathogens promises to vastly improve our ability to track and control foodborne disease, but to do so it must be combined with epidemiological, clinical, laboratory and other health care data (called "contextual data") to be meaningfully interpreted for regulatory and health interventions, outbreak investigation, and risk assessment. However, current multi-jurisdictional pathogen surveillance and investigation efforts are complicated by time-consuming data re-entry, curation and integration of contextual information owing to a lack of interoperable standards and inconsistent reporting. A solution to these challenges is the use of 'ontologies' - hierarchies of well-defined and standardized vocabularies interconnected by logical relationships. Terms are specified by universal IDs enabling integration into highly regulated areas and multi-sector sharing (e.g., food and water microbiology with the veterinary sector). Institution-specific terms can be mapped to a given standard at different levels of granularity, maximizing comparability of contextual information according to jurisdictional policies. Fit-for-purpose ontologies provide contextual information with the auditability required for food safety laboratory accreditation. Our research efforts include the development of a Genomic Epidemiology Ontology (GenEpiO), and Food Ontology (FoodOn) that harmonize important laboratory, clinical and epidemiological data fields, as well as existing food resources. These efforts are supported by a global consortium of researchers and stakeholders worldwide. Since foodborne diseases do not respect international borders, uptake of such vocabularies will be crucial for multi-jurisdictional interpretation of WGS results and data sharing.

11.
Am J Infect Control ; 45(2): 170-179, 2017 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-28159067

RESUMEN

With the growing importance of infectious diseases in health care and communicable disease outbreaks garnering increasing attention, new technologies are playing a greater role in helping us prevent health care-associated infections and provide optimal public health. The microbiology laboratory has always played a large role in infection control by providing tools to identify, characterize, and track pathogens. Recently, advances in DNA sequencing technology have ushered in a new era of genomic epidemiology, where traditional molecular diagnostics and genotyping methods are being enhanced and even replaced by genomics-based methods to aid epidemiologic investigations of communicable diseases. The ability to analyze and compare entire pathogen genomes has allowed for unprecedented resolution into how and why infectious diseases spread. As these genomics-based methods continue to improve in speed, cost, and accuracy, they will be increasingly used to inform and guide infection control and public health practices.


Asunto(s)
Infección Hospitalaria/diagnóstico , Infección Hospitalaria/prevención & control , Control de Infecciones/métodos , Epidemiología Molecular/métodos , Técnicas de Genotipaje/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos
12.
Bioinformatics ; 32(8): 1275-7, 2016 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-26656932

RESUMEN

MOTIVATION: There are various reasons for rerunning bioinformatics tools and pipelines on sequencing data, including reproducing a past result, validation of a new tool or workflow using a known dataset, or tracking the impact of database changes. For identical results to be achieved, regularly updated reference sequence databases must be versioned and archived. Database administrators have tried to fill the requirements by supplying users with one-off versions of databases, but these are time consuming to set up and are inconsistent across resources. Disk storage and data backup performance has also discouraged maintaining multiple versions of databases since databases such as NCBI nr can consume 50 Gb or more disk space per version, with growth rates that parallel Moore's law. RESULTS: Our end-to-end solution combines our own Kipper software package-a simple key-value large file versioning system-with BioMAJ (software for downloading sequence databases), and Galaxy (a web-based bioinformatics data processing platform). Available versions of databases can be recalled and used by command-line and Galaxy users. The Kipper data store format makes publishing curated FASTA databases convenient since in most cases it can store a range of versions into a file marginally larger than the size of the latest version. AVAILABILITY AND IMPLEMENTATION: Kipper v1.0.0 and the Galaxy Versioned Data tool are written in Python and released as free and open source software available at https://github.com/Public-Health-Bioinformatics/kipper and https://github.com/Public-Health-Bioinformatics/versioned_data, respectively; detailed setup instructions can be found at https://github.com/Public-Health-Bioinformatics/versioned_data/blob/master/doc/setup.md CONTACT: : Damion.Dooley@Bccdc.Ca or William.Hsiao@Bccdc.CaSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Bases de Datos de Ácidos Nucleicos , Programas Informáticos , Interfaz Usuario-Computador
13.
Appl Environ Microbiol ; 81(14): 4827-34, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25956776

RESUMEN

Giardia is the most common parasitic cause of gastrointestinal infections worldwide, with transmission through surface water playing an important role in various parts of the world. Giardia duodenalis (synonyms: G. intestinalis and G. lamblia), a multispecies complex, has two zoonotic subtypes, assemblages A and B. When British Columbia (BC), a western Canadian province, experienced several waterborne giardiasis outbreaks due to unfiltered surface drinking water in the late 1980s, collection of isolates from surface water, as well as from humans and beavers (Castor canadensis), throughout the province was carried out. To better understand Giardia in surface water, 71 isolates, including 29 from raw surface water samples, 29 from human giardiasis cases, and 13 from beavers in watersheds from this historical library were characterized by PCR. Study isolates also included isolates from waterborne giardiasis outbreaks. Both assemblages A and B were identified in surface water, human, and beavers samples, including a mixture of both assemblages A and B in waterborne outbreaks. PCR results were confirmed by whole-genome sequencing (WGS) for one waterborne outbreak and supported the clustering of human, water, and beaver isolates within both assemblages. We concluded that contamination of surface water by Giardia is complex, that the majority of our surface water isolates were assemblage B, and that both assemblages A and B may cause waterborne outbreaks. The higher-resolution data provided by WGS warrants further study to better understand the spread of Giardia.


Asunto(s)
Agua Dulce/parasitología , Giardia lamblia/clasificación , Giardia lamblia/aislamiento & purificación , Colombia Británica , Genoma de Protozoos , Genotipo , Giardia lamblia/genética , Giardiasis/parasitología , Humanos , Datos de Secuencia Molecular , Filogenia , Reacción en Cadena de la Polimerasa
14.
Front Microbiol ; 6: 1405, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26733955

RESUMEN

Select bacteria, such as Escherichia coli or coliforms, have been widely used as sentinels of low water quality; however, there are concerns regarding their predictive accuracy for the protection of human and environmental health. To develop improved monitoring systems, a greater understanding of bacterial community structure, function, and variability across time is required in the context of different pollution types, such as agricultural and urban contamination. Here, we present a year-long survey of free-living bacterial DNA collected from seven sites along rivers in three watersheds with varying land use in Southwestern Canada. This is the first study to examine the bacterial metagenome in flowing freshwater (lotic) environments over such a time span, providing an opportunity to describe bacterial community variability as a function of land use and environmental conditions. Characteristics of the metagenomic data, such as sequence composition and average genome size (AGS), vary with sampling site, environmental conditions, and water chemistry. For example, AGS was correlated with hours of daylight in the agricultural watershed and, across the agriculturally and urban-affected sites, k-mer composition clustering corresponded to nutrient concentrations. In addition to indicating a community shift, this change in AGS has implications in terms of the normalization strategies required, and considerations surrounding such strategies in general are discussed. When comparing abundances of gene functional groups between high- and low-quality water samples collected from an agricultural area, the latter had a higher abundance of nutrient metabolism and bacteriophage groups, possibly reflecting an increase in agricultural runoff. This work presents a valuable dataset representing a year of monthly sampling across watersheds and an analysis targeted at establishing a foundational understanding of how bacterial lotic communities vary across time and land use. The results provide important context for future studies, including further analyses of watershed ecosystem health, and the identification and development of biomarkers for improved water quality monitoring systems.

15.
PLoS One ; 8(3): e59484, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23544073

RESUMEN

BACKGROUND: Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free) reads at different lengths. METHODOLOGY/PRINCIPAL FINDINGS: We define a metric H(k) to be the entropy of sequencing reads at a read length k and use the relative loss of entropy ΔH(k) to measure the impact of repeats for the reconstruction of whole-genome from sequences of length k. In our experiments, we found that entropy loss correlates well with de-novo assembly coverage of a genome, and a score of ΔH(k)>1% indicates a severe loss in genome reconstruction fidelity. The minimal read lengths to achieve ΔH(k)<1% are different for various organisms and are independent of the genome size. For example, in order to meet the threshold of ΔH(k)<1%, a read length of 60 bp is needed for the sequencing of human genome (3.2 10(9) bp) and 320 bp for the sequencing of fruit fly (1.8×10(8) bp). We also calculated the ΔH(k) scores for 2725 prokaryotic chromosomes and plasmids at several read lengths. Our results indicate that the levels of repeats in different genomes are diverse and the entropy of sequencing reads provides a measurement for the repeat structures. CONCLUSIONS/SIGNIFICANCE: The proposed entropy-based measurement, which can be calculated in seconds to minutes in most cases, provides a rapid quantitative evaluation on the limitation of idealized short-read genome sequencing. Moreover, the calculation can be parallelized to scale up to large euakryotic genomes. This approach may be useful to tune the sequencing parameters to achieve better genome assemblies when a closely related genome is already available.


Asunto(s)
Entropía , Genoma/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Análisis de Secuencia de ADN/métodos , Animales , Bacterias/genética , Emparejamiento Base/genética , Secuencia de Bases , Cromosomas/genética , Cromosomas Artificiales Bacterianos/genética , Humanos , Células Procariotas/metabolismo
16.
Bioinformatics ; 29(8): 1004-10, 2013 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-23457040

RESUMEN

MOTIVATION: High-accuracy de novo assembly of the short sequencing reads from RNA-Seq technology is very challenging. We introduce a de novo assembly algorithm, EBARDenovo, which stands for Extension, Bridging And Repeat-sensing Denovo. This algorithm uses an efficient chimera-detection function to abrogate the effect of aberrant chimeric reads in RNA-Seq data. RESULTS: EBARDenovo resolves the complications of RNA-Seq assembly arising from sequencing errors, repetitive sequences and aberrant chimeric amplicons. In a series of assembly experiments, our algorithm is the most accurate among the examined programs, including de Bruijn graph assemblers, Trinity and Oases. AVAILABILITY AND IMPLEMENTATION: EBARDenovo is available at http://ebardenovo.sourceforge.net/. This software package (with patent pending) is free of charge for academic use only. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , ARN/química , Secuencias Repetitivas de Ácidos Nucleicos , Programas Informáticos
17.
Bioinformatics ; 28(14): 1947-8, 2012 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-22576174

RESUMEN

UNLABELLED: Analysis of microbial genomes often requires the general organization and comparison of tens to thousands of genomes both from public repositories and unpublished sources. MicrobeDB provides a foundation for such projects by the automation of downloading published, completed bacterial and archaeal genomes from key sources, parsing annotations of all genomes (both public and private) into a local database, and allowing interaction with the database through an easy to use programming interface. MicrobeDB creates a simple to use, easy to maintain, centralized local resource for various large-scale comparative genomic analyses and a back-end for future microbial application design. AVAILABILITY: MicrobeDB is freely available under the GNU-GPL at: http://github.com/mlangill/microbedb/


Asunto(s)
Bases de Datos Genéticas , Genoma Arqueal , Genoma Bacteriano , Biología Computacional/métodos , Genómica , Internet , Programas Informáticos , Interfaz Usuario-Computador
18.
BMC Genomics ; 13 Suppl 7: S5, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23282223

RESUMEN

BACKGROUND: Mitochondrial dysfunction is associated with various aging diseases. The copy number of mtDNA in human cells may therefore be a potential biomarker for diagnostics of aging. Here we propose a new computational method for the accurate assessment of mtDNA copies from whole genome sequencing data. RESULTS: Two families of the human whole genome sequencing datasets from the HapMap and the 1000 Genomes projects were used for the accurate counting of mitochondrial DNA copy numbers. The results revealed the parental mitochondrial DNA copy numbers are significantly lower than that of their children in these samples. There are 8%~21% more copies of mtDNA in samples from the children than from their parents. The experiment demonstrated the possible correlations between the quantity of mitochondrial DNA and aging-related diseases. CONCLUSIONS: Since the next-generation sequencing technology strives to deliver affordable and non-biased sequencing results, accurate assessment of mtDNA copy numbers can be achieved effectively from the output of whole genome sequencing. We implemented the method as a software package MitoCounter with the source code and user's guide available to the public at http://sourceforge.net/projects/mitocounter/.


Asunto(s)
ADN Mitocondrial/metabolismo , Genoma Humano , Mitocondrias/genética , Adulto , Niño , Bases de Datos Genéticas , Femenino , Humanos , Masculino , Análisis de Secuencia de ADN , Programas Informáticos
19.
Am J Infect Control ; 38(9): 751-3, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20570393

RESUMEN

In this study, we identified critically ill patients with Acinetobacter baumannii bacteremia and examined perirectal surveillance cultures for the presence of genetically related A baumannii strains using pulsed-field gel electrophoresis to determine whether gut colonization preceded clinical infection. Seven patients with imipenem-resistant A baumannii bacteremia were identified from January to June of 2008. Six of 7 (86%) patients were colonized in the gastrointestinal tract with genetically similar strains preceding their bacteremia.


Asunto(s)
Infecciones por Acinetobacter/microbiología , Acinetobacter baumannii/clasificación , Bacteriemia/microbiología , Técnicas de Tipificación Bacteriana , Sangre/microbiología , Tracto Gastrointestinal/microbiología , Acinetobacter baumannii/efectos de los fármacos , Acinetobacter baumannii/aislamiento & purificación , Adulto , Anciano , Antibacterianos/farmacología , Enfermedad Crítica , Infección Hospitalaria/microbiología , Dermatoglifia del ADN , Electroforesis en Gel de Campo Pulsado , Femenino , Humanos , Imipenem/farmacología , Masculino , Persona de Mediana Edad , Epidemiología Molecular , Resistencia betalactámica
20.
Nat Rev Microbiol ; 8(5): 373-82, 2010 05.
Artículo en Inglés | MEDLINE | ID: mdl-20395967

RESUMEN

Bacterial genomes contain clusters of genes that are acquired by horizontal transfer, called genomic islands (GIs). GIs are frequently associated with microbial adaptations that are of medical and environmental interest, and they have had a substantial impact on bacterial evolution. Therefore, there is growing interest in efficiently identifying GIs in newly sequenced bacterial genomes. Several computational methods for detecting GIs have been developed recently, presenting researchers with a myriad of choices. Here, we discuss the limitations and benefits of the main approaches that are available and present guidelines to aid researchers in effectively identifying these important genomic regions.


Asunto(s)
Biología Computacional/métodos , Genoma Bacteriano , Islas Genómicas , Bacterias/genética , Bacterias/patogenicidad , Bases de Datos Genéticas , Genómica/métodos , Genómica/estadística & datos numéricos , Virulencia/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA