RESUMEN
The Comprehensive Antibiotic Resistance Database (CARD; card.mcmaster.ca) combines the Antibiotic Resistance Ontology (ARO) with curated AMR gene (ARG) sequences and resistance-conferring mutations to provide an informatics framework for annotation and interpretation of resistomes. As of version 3.2.4, CARD encompasses 6627 ontology terms, 5010 reference sequences, 1933 mutations, 3004 publications, and 5057 AMR detection models that can be used by the accompanying Resistance Gene Identifier (RGI) software to annotate genomic or metagenomic sequences. Focused curation enhancements since 2020 include expanded ß-lactamase curation, incorporation of likelihood-based AMR mutations for Mycobacterium tuberculosis, addition of disinfectants and antiseptics plus their associated ARGs, and systematic curation of resistance-modifying agents. This expanded curation includes 180 new AMR gene families, 15 new drug classes, 1 new resistance mechanism, and two new ontological relationships: evolutionary_variant_of and is_small_molecule_inhibitor. In silico prediction of resistomes and prevalence statistics of ARGs has been expanded to 377 pathogens, 21,079 chromosomes, 2,662 genomic islands, 41,828 plasmids and 155,606 whole-genome shotgun assemblies, resulting in collation of 322,710 unique ARG allele sequences. New features include the CARD:Live collection of community submitted isolate resistome data and the introduction of standardized 15 character CARD Short Names for ARGs to support machine learning efforts.
Asunto(s)
Curaduría de Datos , Bases de Datos Factuales , Farmacorresistencia Microbiana , Aprendizaje Automático , Antibacterianos/farmacología , Genes Bacterianos , Funciones de Verosimilitud , Programas Informáticos , Anotación de Secuencia MolecularRESUMEN
The CHILD Cohort Study is an active multi-center longitudinal, prospective, population pregnancy cohort study following Canadian infants from fetal life until adulthood. We hypothesized that early life physical and psychosocial environments interact with biological factors (e.g. immunologic, genetic, physiologic, and metabolic) influencing burdensome non-communicable disease outcomes, including asthma and allergic disorders, growth and development, cardio-metabolic health, and neurodevelopmental outcomes that manifest during the life-course. Detailed clinical and physiologic phenotyping at strategic intervals was complemented by environmental sampling, actigraphy and global positioning system measures, biological sampling including gut, breastmilk and nasal microbiome, nutritional studies, genetics, and epigenetic profiling. Of 3,454 families recruited from 2008 to 2012, study retention was 96.0% at 1-year, 93.2% at 5-years and 90.7% at 8-years. Data collection during the SARS-2 COVID-19 pandemic was partially completed via virtual visits. A sub-cohort was implemented, capturing detailed information on the prevalence and predictors of SARS-CoV-2 infection and the health and psychosocial impact of the pandemic on Canadian families. The 13-year clinical assessment launched in 2022 will be completed in 2025. Ultimately, the CHILD Cohort Study provides a data science platform designed to enable a deep understanding of early life factors associated with the development of chronic non-communicable diseases and multimorbidity.
RESUMEN
Protein subcellular localization (SCL) is important for understanding protein function, genome annotation, and aids identification of potential cell surface diagnostic markers, drug targets, or vaccine components. PSORTdb comprises ePSORTdb, a manually curated database of experimentally verified protein SCLs, and cPSORTdb, a pre-computed database of PSORTb-predicted SCLs for NCBI's RefSeq deduced bacterial and archaeal proteomes. We now report PSORTdb 4.0 (http://db.psort.org/). It features a website refresh, in particular a more user-friendly database search. It also addresses the need to uniquely identify proteins from NCBI genomes now that GI numbers have been retired. It further expands both ePSORTdb and cPSORTdb, including additional data about novel secondary localizations, such as proteins found in bacterial outer membrane vesicles. Protein predictions in cPSORTdb have increased along with the number of available microbial genomes, from approximately 13 million when PSORTdb 3.0 was released, to over 66 million currently. Now, analyses of both complete and draft genomes are included. This expanded database will be of wide use to researchers developing SCL predictors or studying diverse microbes, including medically, agriculturally and industrially important species that have both classic or atypical cell envelope structures or vesicles.
Asunto(s)
Proteínas Arqueales/metabolismo , Proteínas Bacterianas/metabolismo , Bases de Datos de Proteínas , Secuencia de Aminoácidos , Proteínas Arqueales/química , Proteínas Bacterianas/química , Pared Celular/química , Transporte de Proteínas , Fracciones Subcelulares/metabolismo , Interfaz Usuario-ComputadorRESUMEN
Antibiotic-resistant superbug bacteria represent a global health problem with no imminent solutions. Here we demonstrate that the combination (termed AB569) of acidified nitrite (A-NO2-) and Na2-EDTA (disodium ethylenediaminetetraacetic acid) inhibited all Gram-negative and Gram-positive bacteria tested. AB569 was also efficacious at killing the model organism Pseudomonas aeruginosa in biofilms and in a murine chronic lung infection model. AB569 was not toxic to human cell lines at bactericidal concentrations using a basic viability assay. RNA-Seq analyses upon treatment of P. aeruginosa with AB569 revealed a catastrophic loss of the ability to support core pathways encompassing DNA, RNA, protein, ATP biosynthesis, and iron metabolism. Electrochemical analyses elucidated that AB569 produced more stable SNO proteins, potentially explaining one mechanism of bacterial killing. Our data implicate that AB569 is a safe and effective means to kill pathogenic bacteria, suggesting that simple strategies could be applied with highly advantageous therapeutic/toxicity index ratios to pathogens associated with a myriad of periepithelial infections and related disease scenarios.
Asunto(s)
Antibacterianos/química , Antibacterianos/farmacología , Bacterias/efectos de los fármacos , Ácido Edético/farmacología , Nitrito de Sodio/farmacología , Animales , Antibacterianos/uso terapéutico , Biopelículas/efectos de los fármacos , Modelos Animales de Enfermedad , Regulación hacia Abajo , Farmacorresistencia Bacteriana/efectos de los fármacos , Ácido Edético/química , Enfermedades Pulmonares/tratamiento farmacológico , Enfermedades Pulmonares/microbiología , Redes y Vías Metabólicas , Ratones , Nitritos/química , Nitritos/farmacología , Pseudomonas aeruginosa/efectos de los fármacosRESUMEN
The Comprehensive Antibiotic Resistance Database (CARD; https://card.mcmaster.ca) is a curated resource providing reference DNA and protein sequences, detection models and bioinformatics tools on the molecular basis of bacterial antimicrobial resistance (AMR). CARD focuses on providing high-quality reference data and molecular sequences within a controlled vocabulary, the Antibiotic Resistance Ontology (ARO), designed by the CARD biocuration team to integrate with software development efforts for resistome analysis and prediction, such as CARD's Resistance Gene Identifier (RGI) software. Since 2017, CARD has expanded through extensive curation of reference sequences, revision of the ontological structure, curation of over 500 new AMR detection models, development of a new classification paradigm and expansion of analytical tools. Most notably, a new Resistomes & Variants module provides analysis and statistical summary of in silico predicted resistance variants from 82 pathogens and over 100 000 genomes. By adding these resistance variants to CARD, we are able to summarize predicted resistance using the information included in CARD, identify trends in AMR mobility and determine previously undescribed and novel resistance variants. Here, we describe updates and recent expansions to CARD and its biocuration process, including new resources for community biocuration of AMR molecular reference data.
Asunto(s)
Bases de Datos Genéticas , Farmacorresistencia Bacteriana , Genes Bacterianos , Programas Informáticos , Bacterias/efectos de los fármacos , Bacterias/genética , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismoRESUMEN
Horizontal gene transfer (also called lateral gene transfer) is a major mechanism for microbial genome evolution, enabling rapid adaptation and survival in specific niches. Genomic islands (GIs), commonly defined as clusters of bacterial or archaeal genes of probable horizontal origin, are of particular medical, environmental and/or industrial interest, as they disproportionately encode virulence factors and some antimicrobial resistance genes and may harbor entire metabolic pathways that confer a specific adaptation (solvent resistance, symbiosis properties, etc). As large-scale analyses of microbial genomes increases, such as for genomic epidemiology investigations of infectious disease outbreaks in public health, there is increased appreciation of the need to accurately predict and track GIs. Over the past decade, numerous computational tools have been developed to tackle the challenges inherent in accurate GI prediction. We review here the main types of GI prediction methods and discuss their advantages and limitations for a routine analysis of microbial genomes in this era of rapid whole-genome sequencing. An assessment is provided of 20 GI prediction software methods that use sequence-composition bias to identify the GIs, using a reference GI data set from 104 genomes obtained using an independent comparative genomics approach. Finally, we present guidelines to assist researchers in effectively identifying these key genomic regions.
Asunto(s)
Genoma Bacteriano , Bases de Datos Genéticas , Evolución Molecular , Transferencia de Gen Horizontal , Aprendizaje AutomáticoRESUMEN
MOTIVATION: Many methods for microbial protein subcellular localization (SCL) prediction exist; however, none is readily available for analysis of metagenomic sequence data, despite growing interest from researchers studying microbial communities in humans, agri-food relevant organisms and in other environments (e.g. for identification of cell-surface biomarkers for rapid protein-based diagnostic tests). We wished to also identify new markers of water quality from freshwater samples collected from pristine versus pollution-impacted watersheds. RESULTS: We report PSORTm, the first bioinformatics tool designed for prediction of diverse bacterial and archaeal protein SCL from metagenomics data. PSORTm incorporates components of PSORTb, one of the most precise and widely used protein SCL predictors, with an automated classification by cell envelope. An evaluation using 5-fold cross-validation with in silico-fragmented sequences with known localization showed that PSORTm maintains PSORTb's high precision, while sensitivity increases proportionately with metagenomic sequence fragment length. PSORTm's read-based analysis was similar to PSORTb-based analysis of metagenome-assembled genomes (MAGs); however, the latter requires non-trivial manual classification of each MAG by cell envelope, and cannot make use of unassembled sequences. Analysis of the watershed samples revealed the importance of normalization and identified potential biomarkers of water quality. This method should be useful for examining a wide range of microbial communities, including human microbiomes, and other microbiomes of medical, environmental or industrial importance. AVAILABILITY AND IMPLEMENTATION: Documentation, source code and docker containers are available for running PSORTm locally at https://www.psort.org/psortm/ (freely available, open-source software under GNU General Public License Version 3). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Archaea , Metagenómica , Archaea/genética , Bacterias/genética , Humanos , Metagenoma , Programas InformáticosRESUMEN
BACKGROUND: Forty percent of the world's population live in areas where they are at risk from dengue fever, dengue hemorrhagic fever, and dengue shock syndrome. Dengue viruses are transmitted primarily by the mosquito Aedes aegypti. In Cali, Colombia, approximately 30% of field collected Ae. aegypti are naturally refractory to all four dengue serotypes. OBJECTIVES: Use RNA-sequencing to identify those genes that determine refractoriness in feral mosquitoes to dengue. This information can be used in gene editing strategies to reduce dengue transmission. METHODS: We employed a full factorial design, analyzing differential gene expression across time (24, 36 and 48 h post bloodmeal), feeding treatment (blood or blood + dengue-2) and strain (susceptible or refractory). Sequences were aligned to the reference Ae. aegypti genome for identification, assembled to visualize transcript structure, and analyzed for dynamic gene expression changes. A variety of clustering techniques was used to identify the differentially expressed genes. FINDINGS: We identified a subset of genes that likely assist dengue entry and replication in susceptible mosquitoes and contribute to vector competence. MAIN CONCLUSIONS: The differential expression of specific genes by refractory and susceptible mosquitoes could determine the phenotype, and may be used to in gene editing strategies to reduce dengue transmission.
Asunto(s)
Aedes , Virus del Dengue , Dengue , Aedes/genética , Animales , Colombia , Virus del Dengue/genética , Mosquitos Vectores/genética , ARN , Transcriptoma/genéticaRESUMEN
Motivation: Genomic islands (GIs) are clusters of genes of probable horizontal origin that play a major role in bacterial and archaeal genome evolution and microbial adaptability. They are of high medical and industrial interest, due to their enrichment in virulence factors, some antimicrobial resistance genes and adaptive metabolic pathways. The development of more sensitive but precise prediction tools, using either sequence composition-based methods or comparative genomics, is needed as large-scale analyses of microbial genomes increase. Results: IslandPath-DIMOB, a leading GI prediction tool in the IslandViewer webserver, has now been significantly improved by modifying both the decision algorithm to determine sequence composition biases, and the underlying database of HMM profiles for associated mobility genes. The accuracy of IslandPath-DIMOB and other major software has been assessed using a reference GI dataset predicted by comparative genomics, plus a manually curated dataset from literature review. Compared to the previous version (v0.2.0), this IslandPath-DIMOB v1.0.0 achieves 11.7% and 5.3% increase in recall and precision, respectively. IslandPath-DIMOB has the highest Matthews correlation coefficient among individual prediction methods tested, combining one of the highest recall measures (46.9%) at high precision (87.4%). The only method with higher recall had notably lower precision (55.1%). This new IslandPath-DIMOB v1.0.0 will facilitate more accurate studies of GIs, including their key roles in microbial adaptability of medical, environmental and industrial interest. Availability and implementation: IslandPath-DIMOB v1.0.0 is freely available through the IslandViewer webserver {{http://www.pathogenomics.sfu.ca/islandviewer/}} and as standalone software {{https://github.com/brinkmanlab/islandpath/}} under the GNU-GPLv3. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Islas Genómicas , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Archaea/genética , Bacterias/genética , Composición de Base , Genoma Arqueal , Genoma BacterianoRESUMEN
IslandViewer (http://www.pathogenomics.sfu.ca/islandviewer/) is a widely-used webserver for the prediction and interactive visualization of genomic islands (GIs, regions of probable horizontal origin) in bacterial and archaeal genomes. GIs disproportionately encode factors that enhance the adaptability and competitiveness of the microbe within a niche, including virulence factors and other medically or environmentally important adaptations. We report here the release of IslandViewer 4, with novel features to accommodate the needs of larger-scale microbial genomics analysis, while expanding GI predictions and improving its flexible visualization interface. A user management web interface as well as an HTTP API for batch analyses are now provided with a secured authentication to facilitate the submission of larger numbers of genomes and the retrieval of results. In addition, IslandViewer's integrated GI predictions from multiple methods have been improved and expanded by integrating the precise Islander method for pre-computed genomes, as well as an updated IslandPath-DIMOB for both pre-computed and user-supplied custom genome analysis. Finally, pre-computed predictions including virulence factors and antimicrobial resistance are now available for 6193 complete bacterial and archaeal strains publicly available in RefSeq. IslandViewer 4 provides key enhancements to facilitate the analysis of GIs and better understand their role in the evolution of successful environmental microbes and pathogens.
Asunto(s)
Genoma Arqueal , Genoma Bacteriano , Islas Genómicas , Programas Informáticos , Conjuntos de Datos como Asunto , Genes Arqueales , Genes Bacterianos , Genómica , Internet , Interfaz Usuario-ComputadorRESUMEN
The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence. Recent improvements include extensive curation of additional reference sequences and mutations, development of a unique Model Ontology and accompanying AMR detection models to power sequence analysis, new visualization tools, and expansion of the RGI for detection of emergent AMR threats. CARD curation is updated monthly based on an interplay of manual literature curation, computational text mining, and genome analysis.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Farmacorresistencia Microbiana , Microbiología , Ontologías Biológicas , Curaduría de Datos , Navegador WebRESUMEN
BACKGROUND: Understanding the RNA processing of an organism's transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5'-monophosphate terminated RNA (pRNA) using monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. RESULTS: The pRNA-Seq library revealed known tRNA, rRNA and transfer-messenger RNA (tmRNA) processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5' ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and were centered on single palindromic TAT(A/T)ATA motifs (likely - 10 promoter elements), suggesting that, consistent with previous in vitro experimentation, these sites can initiate transcription bi-directionally and may thus provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. CONCLUSIONS: This study uses pRNA-Seq, a method that provides a genome-wide survey of RNA processing, to study the bacterium Pseudomonas aeruginosa and discover extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation. NOTE: All sequence data has been submitted to the NCBI sequence read archive. Accession numbers are as follows: [NCBI sequence read archive: SRX156386, SRX157659, SRX157660, SRX157661, SRX157683 and SRX158075]. The sequence data is viewable using Jbrowse on www.pseudomonas.com .
Asunto(s)
Genoma Bacteriano , Pseudomonas aeruginosa/genética , Procesamiento Postranscripcional del ARN , ARN Bacteriano/genética , Sitio de Iniciación de la Transcripción , Mapeo Cromosómico , Secuenciación de Nucleótidos de Alto Rendimiento , Regiones Promotoras Genéticas , Pseudomonas aeruginosa/crecimiento & desarrollo , Análisis de Secuencia de ARNRESUMEN
Protein subcellular localization (SCL) is important for understanding protein function, genome annotation, and has practical applications such as identification of potential vaccine components or diagnostic/drug targets. PSORTdb (http://db.psort.org) comprises manually curated SCLs for proteins which have been experimentally verified (ePSORTdb), as well as pre-computed SCL predictions for deduced proteomes from bacterial and archaeal complete genomes available from NCBI (cPSORTdb). We now report PSORTdb 3.0. It features improvements increasing user-friendliness, and further expands both ePSORTdb and cPSORTdb with a focus on improving protein SCL data in cases where it is most difficult-proteins associated with non-classical Gram-positive/Gram-negative/Gram-variable cell envelopes. ePSORTdb data curation was expanded, including adding in additional cell envelope localizations, and incorporating markers for cPSORTdb to automatically computationally identify if new genomes to be analysed fall into certain atypical cell envelope categories (i.e. Deinococcus-Thermus, Thermotogae, Corynebacteriales/Corynebacterineae, including Mycobacteria). The number of predicted proteins in cPSORTdb has increased from 3,700,000 when PSORTdb 2.0 was released to over 13,000,000 currently. PSORTdb 3.0 will be of wider use to researchers studying a greater diversity of monoderm or diderm microbes, including medically, agriculturally and industrially important species that have non-classical outer membranes or other cell envelope features.
Asunto(s)
Proteínas Arqueales/genética , Proteínas Bacterianas/genética , Bases de Datos de Proteínas , Proteínas de la Membrana/genética , Proteínas Arqueales/análisis , Proteínas Bacterianas/análisis , Membrana Celular/química , Pared Celular/química , Genoma Arqueal , Genoma Bacteriano , Proteínas de la Membrana/análisisRESUMEN
The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches.
Asunto(s)
Bases de Datos Genéticas , Genoma Bacteriano , Anotación de Secuencia Molecular , Pseudomonas/genética , Proteínas Bacterianas/análisis , Proteínas Bacterianas/química , Farmacorresistencia Bacteriana/genética , Ontología de Genes , Islas Genómicas , Internet , Pseudomonas/efectos de los fármacos , Pseudomonas/patogenicidad , Factores de VirulenciaRESUMEN
IslandViewer (http://pathogenomics.sfu.ca/islandviewer) is a widely used web-based resource for the prediction and analysis of genomic islands (GIs) in bacterial and archaeal genomes. GIs are clusters of genes of probable horizontal origin, and are of high interest since they disproportionately encode genes involved in medically and environmentally important adaptations, including antimicrobial resistance and virulence. We now report a major new release of IslandViewer, since the last release in 2013. IslandViewer 3 incorporates a completely new genome visualization tool, IslandPlot, enabling for the first time interactive genome analysis and gene search capabilities using synchronized circular, horizontal and vertical genome views. In addition, more curated virulence factors and antimicrobial resistance genes have been incorporated, and homologs of these genes identified in closely related genomes using strict filters. Pathogen-associated genes have been re-calculated for all pre-computed complete genomes. For user-uploaded genomes to be analysed, IslandViewer 3 can also now handle incomplete genomes, with an improved queuing system on compute nodes to handle user demand. Overall, IslandViewer 3 represents a significant new version of this GI analysis software, with features that may make it more broadly useful for general microbial genome analysis and visualization.
Asunto(s)
Genoma Arqueal , Genoma Bacteriano , Islas Genómicas , Programas Informáticos , Gráficos por Computador , Farmacorresistencia Microbiana/genética , Genómica , Internet , Anotación de Secuencia Molecular , Factores de Virulencia/genéticaRESUMEN
MOTIVATION: A simple static image of genomes and associated metadata is very limiting, as researchers expect rich, interactive tools similar to the web applications found in the post-Web 2.0 world. GenomeD3Plot is a light weight visualization library written in javascript using the D3 library. GenomeD3Plot provides a rich API to allow the rapid visualization of complex genomic data using a convenient standards based JSON configuration file. When integrated into existing web services GenomeD3Plot allows researchers to interact with data, dynamically alter the view, or even resize or reposition the visualization in their browser window. In addition GenomeD3Plot has built in functionality to export any resulting genome visualization in PNG or SVG format for easy inclusion in manuscripts or presentations. RESULTS: GenomeD3Plot is being utilized in the recently released Islandviewer 3 (www.pathogenomics.sfu.ca/islandviewer/) to visualize predicted genomic islands with other genome annotation data. However, its features enable it to be more widely applicable for dynamic visualization of genomic data in general. AVAILABILITY AND IMPLEMENTATION: GenomeD3Plot is licensed under the GNU-GPL v3 at https://github.com/brinkmanlab/GenomeD3Plot/. CONTACT: brinkman@sfu.ca.
Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Genoma Humano , Internet , Programas Informáticos , Islas Genómicas , Genómica/métodos , HumanosRESUMEN
The evolution of metazoans from their choanoflagellate-like unicellular ancestor coincided with the acquisition of novel biological functions to support a multicellular lifestyle, and eventually, the unique cellular and physiological demands of differentiated cell types such as those forming the nervous, muscle and immune systems. In an effort to understand the molecular underpinnings of such metazoan innovations, we carried out a comparative genomics analysis for genes found exclusively in, and widely conserved across, metazoans. Using this approach, we identified a set of 526 core metazoan-specific genes (the 'metazoanome'), approximately 10% of which are largely uncharacterized, 16% of which are associated with known human disease, and 66% of which are conserved in Trichoplax adhaerens, a basal metazoan lacking neurons and other specialized cell types. Global analyses of previously-characterized core metazoan genes suggest a prevalent property, namely that they act as partially redundant modifiers of ancient eukaryotic pathways. Our data also highlights the importance of exaptation of pre-existing genetic tools during metazoan evolution. Expression studies in C. elegans revealed that many metazoan-specific genes, including tubulin folding cofactor E-like (TBCEL/coel-1), are expressed in neurons. We used C. elegans COEL-1 as a representative to experimentally validate the metazoan-specific character of our dataset. We show that coel-1 disruption results in developmental hypersensitivity to the microtubule drug paclitaxel/taxol, and that overexpression of coel-1 has broad effects during embryonic development and perturbs specialized microtubules in the touch receptor neurons (TRNs). In addition, coel-1 influences the migration, neurite outgrowth and mechanosensory function of the TRNs, and functionally interacts with components of the tubulin acetylation/deacetylation pathway. Together, our findings unveil a conserved molecular toolbox fundamental to metazoan biology that contains a number of neuronally expressed and disease-related genes, and reveal a key role for TBCEL/coel-1 in regulating microtubule function during metazoan development and neuronal differentiation.
Asunto(s)
Evolución Molecular , Proteínas Asociadas a Microtúbulos/genética , Microtúbulos/genética , Neuronas/metabolismo , Secuencia de Aminoácidos , Animales , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Regulación del Desarrollo de la Expresión Génica , Homeostasis , Humanos , Redes y Vías Metabólicas/genética , Proteínas Asociadas a Microtúbulos/metabolismo , Microtúbulos/metabolismo , Filogenia , Placozoa/genéticaRESUMEN
Background. Streptococcus pneumoniae can cause a wide spectrum of disease, including invasive pneumococcal disease (IPD). From 2005 to 2009 an outbreak of IPD occurred in Western Canada, caused by a S. pneumoniae strain with multilocus sequence type (MLST) 289 and serotype 5. We sought to investigate the incidence of IPD due to this S. pneumoniae strain and to characterize the outbreak in British Columbia using whole-genome sequencing. Methods. IPD was defined according to Public Health Agency of Canada guidelines. Two isolates representing the beginning and end of the outbreak were whole-genome sequenced. The sequences were analyzed for single nucleotide variants (SNVs) and putative genomic islands. Results. The peak of the outbreak in British Columbia was in 2006, when 57% of invasive S. pneumoniae isolates were serotype 5. Comparison of two whole-genome sequenced strains showed only 10 SNVs between them. A 15.5 kb genomic island was identified in outbreak strains, allowing the design of a PCR assay to track the spread of the outbreak strain. Discussion. We show that the serotype 5 MLST 289 strain contains a distinguishing genomic island, which remained genetically consistent over time. Whole-genome sequencing holds great promise for real-time characterization of outbreaks in the future and may allow responses tailored to characteristics identified in the genome.
RESUMEN
BACKGROUND: The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method's accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same in silico and in vitro test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions. RESULTS: An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class-identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed. CONCLUSIONS: The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis.
Asunto(s)
Bacterias/genética , Biología Computacional/métodos , Simulación por Computador , Metagenómica/métodos , Secuencia de Bases , Bases de Datos Genéticas , Filogenia , Especificidad de la EspecieRESUMEN
The International Molecular Exchange (IMEx) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). Common curation rules have been developed, and a central registry is used to manage the selection of articles to enter into the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices.