Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros












Base de dados
Intervalo de ano de publicação
1.
Microb Genom ; 10(6)2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38860884

RESUMO

As public health laboratories expand their genomic sequencing and bioinformatics capacity for the surveillance of different pathogens, labs must carry out robust validation, training, and optimization of wet- and dry-lab procedures. Achieving these goals for algorithms, pipelines and instruments often requires that lower quality datasets be made available for analysis and comparison alongside those of higher quality. This range of data quality in reference sets can complicate the sharing of sub-optimal datasets that are vital for the community and for the reproducibility of assays. Sharing of useful, but sub-optimal datasets requires careful annotation and documentation of known issues to enable appropriate interpretation, avoid being mistaken for better quality information, and for these data (and their derivatives) to be easily identifiable in repositories. Unfortunately, there are currently no standardized attributes or mechanisms for tagging poor-quality datasets, or datasets generated for a specific purpose, to maximize their utility, searchability, accessibility and reuse. The Public Health Alliance for Genomic Epidemiology (PHA4GE) is an international community of scientists from public health, industry and academia focused on improving the reproducibility, interoperability, portability, and openness of public health bioinformatic software, skills, tools and data. To address the challenges of sharing lower quality datasets, PHA4GE has developed a set of standardized contextual data tags, namely fields and terms, that can be included in public repository submissions as a means of flagging pathogen sequence data with known quality issues, increasing their discoverability. The contextual data tags were developed through consultations with the community including input from the International Nucleotide Sequence Data Collaboration (INSDC), and have been standardized using ontologies - community-based resources for defining the tag properties and the relationships between them. The standardized tags are agnostic to the organism and the sequencing technique used and thus can be applied to data generated from any pathogen using an array of sequencing techniques. The tags can also be applied to synthetic (lab created) data. The list of standardized tags is maintained by PHA4GE and can be found at https://github.com/pha4ge/contextual_data_QC_tags. Definitions, ontology IDs, examples of use, as well as a JSON representation, are provided. The PHA4GE QC tags were tested, and are now implemented, by the FDA's GenomeTrakr laboratory network as part of its routine submission process for SARS-CoV-2 wastewater surveillance. We hope that these simple, standardized tags will help improve communication regarding quality control in public repositories, in addition to making datasets of variable quality more easily identifiable. Suggestions for additional tags can be submitted to PHA4GE via the New Term Request Form in the GitHub repository. By providing a mechanism for feedback and suggestions, we also expect that the tags will evolve with the needs of the community.


Assuntos
Biologia Computacional , Saúde Pública , Controle de Qualidade , Humanos , Biologia Computacional/métodos , Disseminação de Informação/métodos , Reprodutibilidade dos Testes , Anotação de Sequência Molecular/métodos , Genômica/métodos , Software
2.
PLoS One ; 18(6): e0286728, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37267413

RESUMO

An application ontology often reuses terms from other related, compatible ontologies. The extent of this interconnectedness is not readily apparent when browsing through larger textual presentations of term class hierarchies, be it Manchester text format OWL files or within an ontology editor like Protege. Users must either note ontology sources in term identifiers, or look at ontology import file term origins. Diagrammatically, this same information may be easier to perceive in 2 dimensional network or hierarchical graphs that visually code ontology term origins. However, humans, having stereoscopic vision and navigational acuity around colored and textured shapes, should benefit even more from a coherent 3-dimensional interactive visualization of ontology that takes advantage of perspective to offer both foreground focus on content and a stable background context. We present OntoTrek, a 3D ontology visualizer that enables ontology stakeholders-students, software developers, curation teams, and funders-to recognize the presence of imported terms and their domains, ultimately illustrating how projects can capture knowledge through a vocabulary of interwoven community-supported ontology resources.


Assuntos
Imageamento Tridimensional , Software , Humanos
3.
BMJ Open ; 13(2): e066418, 2023 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-36750286

RESUMO

OBJECTIVES: COVID-19 research has significantly contributed to pandemic response and the enhancement of public health capacity. COVID-19 data collected by provincial/territorial health authorities in Canada are valuable for research advancement yet not readily available to the public, including researchers. To inform developments in public health data-sharing in Canada, we explored Canadians' opinions of public health authorities sharing deidentified individual-level COVID-19 data publicly. DESIGN/SETTING/INTERVENTIONS/OUTCOMES: A national cross-sectional survey was administered in Canada in March 2022, assessing Canadians' opinions on publicly sharing COVID-19 datatypes. Market research firm Léger was employed for recruitment and data collection. PARTICIPANTS: Anyone greater than or equal to 18 years and currently living in Canada. RESULTS: 4981 participants completed the survey with a 92.3% response rate. 79.7% were supportive of provincial/territorial authorities publicly sharing deidentified COVID-19 data, while 20.3% were hesitant/averse/unsure. Datatypes most supported for being shared publicly were symptoms (83.0% in support), geographical region (82.6%) and COVID-19 vaccination status (81.7%). Datatypes with the most aversion were employment sector (27.4% averse), postal area (26.7%) and international travel history (19.7%). Generally supportive Canadians were characterised as being ≥50 years, with higher education, and being vaccinated against COVID-19 at least once. Vaccination status was the most influential predictor of data-sharing opinion, with respondents who were ever vaccinated being 4.20 times more likely (95% CI 3.21 to 5.48, p=0.000) to be generally supportive of data-sharing than those unvaccinated. CONCLUSIONS: These findings suggest that the Canadian public is generally favourable to deidentified data-sharing. Identifying factors that are likely to improve attitudes towards data-sharing are useful to stakeholders involved in data-sharing initiatives, such as public health agencies, in informing the development of public health communication and data-sharing policies. As Canada progresses through the COVID-19 pandemic, and with limited testing and reporting of COVID-19 data, it is essential to improve deidentified data-sharing given the public's general support for these efforts.


Assuntos
COVID-19 , Humanos , Estudos Transversais , Opinião Pública , Pandemias , Vacinas contra COVID-19 , Canadá
4.
Microb Genom ; 9(1)2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36748616

RESUMO

Pathogen genomics is a critical tool for public health surveillance, infection control, outbreak investigations as well as research. In order to make use of pathogen genomics data, they must be interpreted using contextual data (metadata). Contextual data include sample metadata, laboratory methods, patient demographics, clinical outcomes and epidemiological information. However, the variability in how contextual information is captured by different authorities and how it is encoded in different databases poses challenges for data interpretation, integration and their use/re-use. The DataHarmonizer is a template-driven spreadsheet application for harmonizing, validating and transforming genomics contextual data into submission-ready formats for public or private repositories. The tool's web browser-based JavaScript environment enables validation and its offline functionality and local installation increases data security. The DataHarmonizer was developed to address the data sharing needs that arose during the COVID-19 pandemic, and was used by members of the Canadian COVID Genomics Network (CanCOGeN) to harmonize SARS-CoV-2 contextual data for national surveillance and for public repository submission. In order to support coordination of international surveillance efforts, we have partnered with the Public Health Alliance for Genomic Epidemiology to also provide a template conforming to its SARS-CoV-2 contextual data specification for use worldwide. Templates are also being developed for One Health and foodborne pathogens. Overall, the DataHarmonizer tool improves the effectiveness and fidelity of contextual data capture as well as its subsequent usability. Harmonization of contextual information across authorities, platforms and systems globally improves interoperability and reusability of data for concerted public health and research initiatives to fight the current pandemic and future public health emergencies. While initially developed for the COVID-19 pandemic, its expansion to other data management applications and pathogens is already underway.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , Pandemias , SARS-CoV-2/genética , Canadá , Genômica/métodos
5.
Nucleic Acids Res ; 51(D1): D690-D699, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36263822

RESUMO

The Comprehensive Antibiotic Resistance Database (CARD; card.mcmaster.ca) combines the Antibiotic Resistance Ontology (ARO) with curated AMR gene (ARG) sequences and resistance-conferring mutations to provide an informatics framework for annotation and interpretation of resistomes. As of version 3.2.4, CARD encompasses 6627 ontology terms, 5010 reference sequences, 1933 mutations, 3004 publications, and 5057 AMR detection models that can be used by the accompanying Resistance Gene Identifier (RGI) software to annotate genomic or metagenomic sequences. Focused curation enhancements since 2020 include expanded ß-lactamase curation, incorporation of likelihood-based AMR mutations for Mycobacterium tuberculosis, addition of disinfectants and antiseptics plus their associated ARGs, and systematic curation of resistance-modifying agents. This expanded curation includes 180 new AMR gene families, 15 new drug classes, 1 new resistance mechanism, and two new ontological relationships: evolutionary_variant_of and is_small_molecule_inhibitor. In silico prediction of resistomes and prevalence statistics of ARGs has been expanded to 377 pathogens, 21,079 chromosomes, 2,662 genomic islands, 41,828 plasmids and 155,606 whole-genome shotgun assemblies, resulting in collation of 322,710 unique ARG allele sequences. New features include the CARD:Live collection of community submitted isolate resistome data and the introduction of standardized 15 character CARD Short Names for ARGs to support machine learning efforts.


Assuntos
Curadoria de Dados , Bases de Dados Factuais , Resistência Microbiana a Medicamentos , Aprendizado de Máquina , Antibacterianos/farmacologia , Genes Bacterianos , Funções Verossimilhança , Software , Anotação de Sequência Molecular
6.
Gigascience ; 112022 02 16.
Artigo em Inglês | MEDLINE | ID: mdl-35169842

RESUMO

BACKGROUND: The Public Health Alliance for Genomic Epidemiology (PHA4GE) (https://pha4ge.org) is a global coalition that is actively working to establish consensus standards, document and share best practices, improve the availability of critical bioinformatics tools and resources, and advocate for greater openness, interoperability, accessibility, and reproducibility in public health microbial bioinformatics. In the face of the current pandemic, PHA4GE has identified a need for a fit-for-purpose, open-source SARS-CoV-2 contextual data standard. RESULTS: As such, we have developed a SARS-CoV-2 contextual data specification package based on harmonizable, publicly available community standards. The specification can be implemented via a collection template, as well as an array of protocols and tools to support both the harmonization and submission of sequence data and contextual information to public biorepositories. CONCLUSIONS: Well-structured, rich contextual data add value, promote reuse, and enable aggregation and integration of disparate datasets. Adoption of the proposed standard and practices will better enable interoperability between datasets and systems, improve the consistency and utility of generated data, and ultimately facilitate novel insights and discoveries in SARS-CoV-2 and COVID-19. The package is now supported by the NCBI's BioSample database.


Assuntos
COVID-19 , SARS-CoV-2 , Genômica , Humanos , Metadados , Saúde Pública , Reprodutibilidade dos Testes
7.
Microb Genom ; 8(12)2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36748524

RESUMO

The White-Kauffmann-Le Minor (WKL) scheme is the most widely used Salmonella typing scheme for reporting the disease prevalence of the enteric pathogen. With the advent of whole-genome sequencing (WGS), in silico methods have increasingly replaced traditional serotyping due to reproducibility, speed and coverage. However, despite integrating genomic-based typing by in silico serotyping tools such as SISTR, in silico serotyping in certain contexts remains ambiguous and insufficiently informative. Specifically, in silico serotyping does not attempt to resolve polyphyly. Furthermore, in spite of the widespread acknowledgement of polyphyly from genomic studies, the prevalence of polyphyletic serovars is not well characterized. Here, we applied a genomics approach to acquire the necessary resolution to classify genetically discordant serovars and propose an alternative typing scheme that consistently reflect natural Salmonella populations. By accessing the unprecedented volume of bacterial genomic data publicly available in GenomeTrakr and PubMLST databases (>180 000 genomes representing 723 serovars), we characterized the global Salmonella population structure and systematically identified putative non-monophyletic serovars. The proportion of putative non-monophyletic serovars was estimated higher than previous reports, reinforcing the inability of antigenic determinants to depict the complexity of Salmonella evolutionary history. We explored the extent of genetic diversity masked by serotyping labels and found significant intra-serovar molecular differences across many clinically important serovars. To avoid false discovery due to incorrect in silico serotyping calls, we cross-referenced reported serovar labels and concluded a low error rate in in silico serotyping. The combined application of clustering statistics and genome-wide association methods demonstrated effective characterization of stable bacterial populations and explained functional differences. The collective methods adopted in our study have practical values in establishing genomic-based typing nomenclatures for an entire microbial species or closely related subpopulations. Ultimately, we foresee an improved typing scheme to be a hybrid that integrates both genomic and antigenic information such that the resolution from WGS is leveraged to improve the precision of subpopulation classification while preserving the common names defined by the WKL scheme.


Assuntos
Salmonella enterica , Salmonella enterica/genética , Reprodutibilidade dos Testes , Estudo de Associação Genômica Ampla , Salmonella/genética , Genômica
8.
Front Genet ; 12: 716541, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35401651

RESUMO

COVID-19 was declared to be a pandemic in March 2020 by the World Health Organization. Timely sharing of viral genomic sequencing data accompanied by a minimal set of contextual data is essential for informing regional, national, and international public health responses. Such contextual data is also necessary for developing, and improving clinical therapies and vaccines, and enhancing the scientific community's understanding of the SARS-CoV-2 virus. The Canadian COVID-19 Genomics Network (CanCOGeN) was launched in April 2020 to coordinate and upscale existing genomics-based COVID-19 research and surveillance efforts. CanCOGeN is performing large-scale sequencing of both the genomes of SARS-CoV-2 virus samples (VirusSeq) and affected Canadians (HostSeq). This paper addresses the privacy concerns associated with sharing the viral sequence data with a pre-defined set of contextual data describing the sample source and case attribute of the sequence data in the Canadian context. Currently, the viral genome sequences are shared by provincial public health laboratories and their healthcare and academic partners, with the Canadian National Microbiology Laboratory and with publicly accessible databases. However, data sharing delays and the provision of incomplete contextual data often occur because publicly releasing such data triggers privacy and data governance concerns. The CanCOGeN Ethics and Governance Expert Working Group thus has investigated several privacy issues cited by CanCOGeN data providers/stewards. This paper addresses these privacy concerns and offers insights primarily in the Canadian context, although similar privacy considerations also exist in other jurisdictions. We maintain that sharing viral sequencing data and its limited associated contextual data in the public domain generally does not pose insurmountable privacy challenges. However, privacy risks associated with reidentification should be actively monitored due to advancements in reidentification methods and the evolving pandemic landscape. We also argue that during a global health emergency such as COVID-19, privacy should not be used as a blanket measure to prevent such genomic data sharing due to the significant benefits it provides towards public health responses and ongoing research activities.

9.
Nucleic Acids Res ; 48(D1): D517-D525, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31665441

RESUMO

The Comprehensive Antibiotic Resistance Database (CARD; https://card.mcmaster.ca) is a curated resource providing reference DNA and protein sequences, detection models and bioinformatics tools on the molecular basis of bacterial antimicrobial resistance (AMR). CARD focuses on providing high-quality reference data and molecular sequences within a controlled vocabulary, the Antibiotic Resistance Ontology (ARO), designed by the CARD biocuration team to integrate with software development efforts for resistome analysis and prediction, such as CARD's Resistance Gene Identifier (RGI) software. Since 2017, CARD has expanded through extensive curation of reference sequences, revision of the ontological structure, curation of over 500 new AMR detection models, development of a new classification paradigm and expansion of analytical tools. Most notably, a new Resistomes & Variants module provides analysis and statistical summary of in silico predicted resistance variants from 82 pathogens and over 100 000 genomes. By adding these resistance variants to CARD, we are able to summarize predicted resistance using the information included in CARD, identify trends in AMR mobility and determine previously undescribed and novel resistance variants. Here, we describe updates and recent expansions to CARD and its biocuration process, including new resources for community biocuration of AMR molecular reference data.


Assuntos
Bases de Dados Genéticas , Farmacorresistência Bacteriana , Genes Bacterianos , Software , Bactérias/efeitos dos fármacos , Bactérias/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo
10.
NPJ Sci Food ; 2: 23, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-31304272

RESUMO

The construction of high capacity data sharing networks to support increasing government and commercial data exchange has highlighted a key roadblock: the content of existing Internet-connected information remains siloed due to a multiplicity of local languages and data dictionaries. This lack of a digital lingua franca is obvious in the domain of human food as materials travel from their wild or farm origin, through processing and distribution chains, to consumers. Well defined, hierarchical vocabulary, connected with logical relationships-in other words, an ontology-is urgently needed to help tackle data harmonization problems that span the domains of food security, safety, quality, production, distribution, and consumer health and convenience. FoodOn (http://foodon.org) is a consortium-driven project to build a comprehensive and easily accessible global farm-to-fork ontology about food, that accurately and consistently describes foods commonly known in cultures from around the world. FoodOn addresses food product terminology gaps and supports food traceability. Focusing on human and domesticated animal food description, FoodOn contains animal and plant food sources, food categories and products, and other facets like preservation processes, contact surfaces, and packaging. Much of FoodOn's vocabulary comes from transforming LanguaL, a mature and popular food indexing thesaurus, into a World Wide Web Consortium (W3C) OWL Web Ontology Language-formatted vocabulary that provides system interoperability, quality control, and software-driven intelligence. FoodOn compliments other technologies facilitating food traceability, which is becoming critical in this age of increasing globalization of food networks.

11.
Front Microbiol ; 8: 1068, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28694792

RESUMO

Globalization of food networks increases opportunities for the spread of foodborne pathogens beyond borders and jurisdictions. High resolution whole-genome sequencing (WGS) subtyping of pathogens promises to vastly improve our ability to track and control foodborne disease, but to do so it must be combined with epidemiological, clinical, laboratory and other health care data (called "contextual data") to be meaningfully interpreted for regulatory and health interventions, outbreak investigation, and risk assessment. However, current multi-jurisdictional pathogen surveillance and investigation efforts are complicated by time-consuming data re-entry, curation and integration of contextual information owing to a lack of interoperable standards and inconsistent reporting. A solution to these challenges is the use of 'ontologies' - hierarchies of well-defined and standardized vocabularies interconnected by logical relationships. Terms are specified by universal IDs enabling integration into highly regulated areas and multi-sector sharing (e.g., food and water microbiology with the veterinary sector). Institution-specific terms can be mapped to a given standard at different levels of granularity, maximizing comparability of contextual information according to jurisdictional policies. Fit-for-purpose ontologies provide contextual information with the auditability required for food safety laboratory accreditation. Our research efforts include the development of a Genomic Epidemiology Ontology (GenEpiO), and Food Ontology (FoodOn) that harmonize important laboratory, clinical and epidemiological data fields, as well as existing food resources. These efforts are supported by a global consortium of researchers and stakeholders worldwide. Since foodborne diseases do not respect international borders, uptake of such vocabularies will be crucial for multi-jurisdictional interpretation of WGS results and data sharing.

12.
Am J Infect Control ; 45(2): 170-179, 2017 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-28159067

RESUMO

With the growing importance of infectious diseases in health care and communicable disease outbreaks garnering increasing attention, new technologies are playing a greater role in helping us prevent health care-associated infections and provide optimal public health. The microbiology laboratory has always played a large role in infection control by providing tools to identify, characterize, and track pathogens. Recently, advances in DNA sequencing technology have ushered in a new era of genomic epidemiology, where traditional molecular diagnostics and genotyping methods are being enhanced and even replaced by genomics-based methods to aid epidemiologic investigations of communicable diseases. The ability to analyze and compare entire pathogen genomes has allowed for unprecedented resolution into how and why infectious diseases spread. As these genomics-based methods continue to improve in speed, cost, and accuracy, they will be increasingly used to inform and guide infection control and public health practices.


Assuntos
Infecção Hospitalar/diagnóstico , Infecção Hospitalar/prevenção & controle , Controle de Infecções/métodos , Epidemiologia Molecular/métodos , Técnicas de Genotipagem/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
13.
Bioinformatics ; 32(8): 1275-7, 2016 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-26656932

RESUMO

MOTIVATION: There are various reasons for rerunning bioinformatics tools and pipelines on sequencing data, including reproducing a past result, validation of a new tool or workflow using a known dataset, or tracking the impact of database changes. For identical results to be achieved, regularly updated reference sequence databases must be versioned and archived. Database administrators have tried to fill the requirements by supplying users with one-off versions of databases, but these are time consuming to set up and are inconsistent across resources. Disk storage and data backup performance has also discouraged maintaining multiple versions of databases since databases such as NCBI nr can consume 50 Gb or more disk space per version, with growth rates that parallel Moore's law. RESULTS: Our end-to-end solution combines our own Kipper software package-a simple key-value large file versioning system-with BioMAJ (software for downloading sequence databases), and Galaxy (a web-based bioinformatics data processing platform). Available versions of databases can be recalled and used by command-line and Galaxy users. The Kipper data store format makes publishing curated FASTA databases convenient since in most cases it can store a range of versions into a file marginally larger than the size of the latest version. AVAILABILITY AND IMPLEMENTATION: Kipper v1.0.0 and the Galaxy Versioned Data tool are written in Python and released as free and open source software available at https://github.com/Public-Health-Bioinformatics/kipper and https://github.com/Public-Health-Bioinformatics/versioned_data, respectively; detailed setup instructions can be found at https://github.com/Public-Health-Bioinformatics/versioned_data/blob/master/doc/setup.md CONTACT: : Damion.Dooley@Bccdc.Ca or William.Hsiao@Bccdc.CaSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Software , Interface Usuário-Computador
14.
Appl Environ Microbiol ; 81(14): 4827-34, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25956776

RESUMO

Giardia is the most common parasitic cause of gastrointestinal infections worldwide, with transmission through surface water playing an important role in various parts of the world. Giardia duodenalis (synonyms: G. intestinalis and G. lamblia), a multispecies complex, has two zoonotic subtypes, assemblages A and B. When British Columbia (BC), a western Canadian province, experienced several waterborne giardiasis outbreaks due to unfiltered surface drinking water in the late 1980s, collection of isolates from surface water, as well as from humans and beavers (Castor canadensis), throughout the province was carried out. To better understand Giardia in surface water, 71 isolates, including 29 from raw surface water samples, 29 from human giardiasis cases, and 13 from beavers in watersheds from this historical library were characterized by PCR. Study isolates also included isolates from waterborne giardiasis outbreaks. Both assemblages A and B were identified in surface water, human, and beavers samples, including a mixture of both assemblages A and B in waterborne outbreaks. PCR results were confirmed by whole-genome sequencing (WGS) for one waterborne outbreak and supported the clustering of human, water, and beaver isolates within both assemblages. We concluded that contamination of surface water by Giardia is complex, that the majority of our surface water isolates were assemblage B, and that both assemblages A and B may cause waterborne outbreaks. The higher-resolution data provided by WGS warrants further study to better understand the spread of Giardia.


Assuntos
Água Doce/parasitologia , Giardia lamblia/classificação , Giardia lamblia/isolamento & purificação , Colúmbia Britânica , Genoma de Protozoário , Genótipo , Giardia lamblia/genética , Giardíase/parasitologia , Humanos , Dados de Sequência Molecular , Filogenia , Reação em Cadeia da Polimerase
15.
Front Microbiol ; 6: 1405, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26733955

RESUMO

Select bacteria, such as Escherichia coli or coliforms, have been widely used as sentinels of low water quality; however, there are concerns regarding their predictive accuracy for the protection of human and environmental health. To develop improved monitoring systems, a greater understanding of bacterial community structure, function, and variability across time is required in the context of different pollution types, such as agricultural and urban contamination. Here, we present a year-long survey of free-living bacterial DNA collected from seven sites along rivers in three watersheds with varying land use in Southwestern Canada. This is the first study to examine the bacterial metagenome in flowing freshwater (lotic) environments over such a time span, providing an opportunity to describe bacterial community variability as a function of land use and environmental conditions. Characteristics of the metagenomic data, such as sequence composition and average genome size (AGS), vary with sampling site, environmental conditions, and water chemistry. For example, AGS was correlated with hours of daylight in the agricultural watershed and, across the agriculturally and urban-affected sites, k-mer composition clustering corresponded to nutrient concentrations. In addition to indicating a community shift, this change in AGS has implications in terms of the normalization strategies required, and considerations surrounding such strategies in general are discussed. When comparing abundances of gene functional groups between high- and low-quality water samples collected from an agricultural area, the latter had a higher abundance of nutrient metabolism and bacteriophage groups, possibly reflecting an increase in agricultural runoff. This work presents a valuable dataset representing a year of monthly sampling across watersheds and an analysis targeted at establishing a foundational understanding of how bacterial lotic communities vary across time and land use. The results provide important context for future studies, including further analyses of watershed ecosystem health, and the identification and development of biomarkers for improved water quality monitoring systems.

16.
PLoS One ; 8(3): e59484, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23544073

RESUMO

BACKGROUND: Recent studies on genome assembly from short-read sequencing data reported the limitation of this technology to reconstruct the entire genome even at very high depth coverage. We investigated the limitation from the perspective of information theory to evaluate the effect of repeats on short-read genome assembly using idealized (error-free) reads at different lengths. METHODOLOGY/PRINCIPAL FINDINGS: We define a metric H(k) to be the entropy of sequencing reads at a read length k and use the relative loss of entropy ΔH(k) to measure the impact of repeats for the reconstruction of whole-genome from sequences of length k. In our experiments, we found that entropy loss correlates well with de-novo assembly coverage of a genome, and a score of ΔH(k)>1% indicates a severe loss in genome reconstruction fidelity. The minimal read lengths to achieve ΔH(k)<1% are different for various organisms and are independent of the genome size. For example, in order to meet the threshold of ΔH(k)<1%, a read length of 60 bp is needed for the sequencing of human genome (3.2 10(9) bp) and 320 bp for the sequencing of fruit fly (1.8×10(8) bp). We also calculated the ΔH(k) scores for 2725 prokaryotic chromosomes and plasmids at several read lengths. Our results indicate that the levels of repeats in different genomes are diverse and the entropy of sequencing reads provides a measurement for the repeat structures. CONCLUSIONS/SIGNIFICANCE: The proposed entropy-based measurement, which can be calculated in seconds to minutes in most cases, provides a rapid quantitative evaluation on the limitation of idealized short-read genome sequencing. Moreover, the calculation can be parallelized to scale up to large euakryotic genomes. This approach may be useful to tune the sequencing parameters to achieve better genome assemblies when a closely related genome is already available.


Assuntos
Entropia , Genoma/genética , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNA/métodos , Animais , Bactérias/genética , Pareamento de Bases/genética , Sequência de Bases , Cromossomos/genética , Cromossomos Artificiais Bacterianos/genética , Humanos , Células Procarióticas/metabolismo
17.
Bioinformatics ; 29(8): 1004-10, 2013 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-23457040

RESUMO

MOTIVATION: High-accuracy de novo assembly of the short sequencing reads from RNA-Seq technology is very challenging. We introduce a de novo assembly algorithm, EBARDenovo, which stands for Extension, Bridging And Repeat-sensing Denovo. This algorithm uses an efficient chimera-detection function to abrogate the effect of aberrant chimeric reads in RNA-Seq data. RESULTS: EBARDenovo resolves the complications of RNA-Seq assembly arising from sequencing errors, repetitive sequences and aberrant chimeric amplicons. In a series of assembly experiments, our algorithm is the most accurate among the examined programs, including de Bruijn graph assemblers, Trinity and Oases. AVAILABILITY AND IMPLEMENTATION: EBARDenovo is available at http://ebardenovo.sourceforge.net/. This software package (with patent pending) is free of charge for academic use only. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , RNA/química , Sequências Repetitivas de Ácido Nucleico , Software
18.
Bioinformatics ; 28(14): 1947-8, 2012 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-22576174

RESUMO

UNLABELLED: Analysis of microbial genomes often requires the general organization and comparison of tens to thousands of genomes both from public repositories and unpublished sources. MicrobeDB provides a foundation for such projects by the automation of downloading published, completed bacterial and archaeal genomes from key sources, parsing annotations of all genomes (both public and private) into a local database, and allowing interaction with the database through an easy to use programming interface. MicrobeDB creates a simple to use, easy to maintain, centralized local resource for various large-scale comparative genomic analyses and a back-end for future microbial application design. AVAILABILITY: MicrobeDB is freely available under the GNU-GPL at: http://github.com/mlangill/microbedb/


Assuntos
Bases de Dados Genéticas , Genoma Arqueal , Genoma Bacteriano , Biologia Computacional/métodos , Genômica , Internet , Software , Interface Usuário-Computador
19.
BMC Genomics ; 13 Suppl 7: S5, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23282223

RESUMO

BACKGROUND: Mitochondrial dysfunction is associated with various aging diseases. The copy number of mtDNA in human cells may therefore be a potential biomarker for diagnostics of aging. Here we propose a new computational method for the accurate assessment of mtDNA copies from whole genome sequencing data. RESULTS: Two families of the human whole genome sequencing datasets from the HapMap and the 1000 Genomes projects were used for the accurate counting of mitochondrial DNA copy numbers. The results revealed the parental mitochondrial DNA copy numbers are significantly lower than that of their children in these samples. There are 8%~21% more copies of mtDNA in samples from the children than from their parents. The experiment demonstrated the possible correlations between the quantity of mitochondrial DNA and aging-related diseases. CONCLUSIONS: Since the next-generation sequencing technology strives to deliver affordable and non-biased sequencing results, accurate assessment of mtDNA copy numbers can be achieved effectively from the output of whole genome sequencing. We implemented the method as a software package MitoCounter with the source code and user's guide available to the public at http://sourceforge.net/projects/mitocounter/.


Assuntos
DNA Mitocondrial/metabolismo , Genoma Humano , Mitocôndrias/genética , Adulto , Criança , Bases de Dados Genéticas , Feminino , Humanos , Masculino , Análise de Sequência de DNA , Software
20.
Am J Infect Control ; 38(9): 751-3, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20570393

RESUMO

In this study, we identified critically ill patients with Acinetobacter baumannii bacteremia and examined perirectal surveillance cultures for the presence of genetically related A baumannii strains using pulsed-field gel electrophoresis to determine whether gut colonization preceded clinical infection. Seven patients with imipenem-resistant A baumannii bacteremia were identified from January to June of 2008. Six of 7 (86%) patients were colonized in the gastrointestinal tract with genetically similar strains preceding their bacteremia.


Assuntos
Infecções por Acinetobacter/microbiologia , Acinetobacter baumannii/classificação , Bacteriemia/microbiologia , Técnicas de Tipagem Bacteriana , Sangue/microbiologia , Trato Gastrointestinal/microbiologia , Acinetobacter baumannii/efeitos dos fármacos , Acinetobacter baumannii/isolamento & purificação , Adulto , Idoso , Antibacterianos/farmacologia , Estado Terminal , Infecção Hospitalar/microbiologia , Impressões Digitais de DNA , Eletroforese em Gel de Campo Pulsado , Feminino , Humanos , Imipenem/farmacologia , Masculino , Pessoa de Meia-Idade , Epidemiologia Molecular , Resistência beta-Lactâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...