Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 11 de 11
1.
Nucleic Acids Res ; 51(D1): D690-D699, 2023 01 06.
Article En | MEDLINE | ID: mdl-36263822

The Comprehensive Antibiotic Resistance Database (CARD; card.mcmaster.ca) combines the Antibiotic Resistance Ontology (ARO) with curated AMR gene (ARG) sequences and resistance-conferring mutations to provide an informatics framework for annotation and interpretation of resistomes. As of version 3.2.4, CARD encompasses 6627 ontology terms, 5010 reference sequences, 1933 mutations, 3004 publications, and 5057 AMR detection models that can be used by the accompanying Resistance Gene Identifier (RGI) software to annotate genomic or metagenomic sequences. Focused curation enhancements since 2020 include expanded ß-lactamase curation, incorporation of likelihood-based AMR mutations for Mycobacterium tuberculosis, addition of disinfectants and antiseptics plus their associated ARGs, and systematic curation of resistance-modifying agents. This expanded curation includes 180 new AMR gene families, 15 new drug classes, 1 new resistance mechanism, and two new ontological relationships: evolutionary_variant_of and is_small_molecule_inhibitor. In silico prediction of resistomes and prevalence statistics of ARGs has been expanded to 377 pathogens, 21,079 chromosomes, 2,662 genomic islands, 41,828 plasmids and 155,606 whole-genome shotgun assemblies, resulting in collation of 322,710 unique ARG allele sequences. New features include the CARD:Live collection of community submitted isolate resistome data and the introduction of standardized 15 character CARD Short Names for ARGs to support machine learning efforts.


Data Curation , Databases, Factual , Drug Resistance, Microbial , Machine Learning , Anti-Bacterial Agents/pharmacology , Genes, Bacterial , Likelihood Functions , Software , Molecular Sequence Annotation
2.
Microb Genom ; 8(5)2022 05.
Article En | MEDLINE | ID: mdl-35584003

Outbreaks of virulent and/or drug-resistant bacteria have a significant impact on human health and major economic consequences. Genomic islands (GIs; defined as clusters of genes of probable horizontal origin) are of high interest because they disproportionately encode virulence factors, some antimicrobial-resistance (AMR) genes, and other adaptations of medical or environmental interest. While microbial genome sequencing has become rapid and inexpensive, current computational methods for GI analysis are not amenable for rapid, accurate, user-friendly and scalable comparative analysis of sets of related genomes. To help fill this gap, we have developed IslandCompare, an open-source computational pipeline for GI prediction and comparison across several to hundreds of bacterial genomes. A dynamic and interactive visualization strategy displays a bacterial core-genome phylogeny, with bacterial genomes linearly displayed at the phylogenetic tree leaves. Genomes are overlaid with GI predictions and AMR determinants from the Comprehensive Antibiotic Resistance Database (CARD), and regions of similarity between the genomes are also displayed. GI predictions are performed using Sigi-HMM and IslandPath-DIMOB, the two most precise GI prediction tools based on nucleotide composition biases, as well as a novel blast-based consistency step to improve cross-genome prediction consistency. GIs across genomes sharing sequence similarity are grouped into clusters, further aiding comparative analysis and visualization of acquisition and loss of mobile GIs in specific sub-clades. IslandCompare is an open-source software that is containerized for local use, plus available via a user-friendly, web-based interface to allow direct use by bioinformaticians, biologists and clinicians (at https://islandcompare.ca).


Genome, Bacterial , Genomic Islands , Bacteria/genetics , Disease Outbreaks , Genomic Islands/genetics , Humans , Phylogeny
3.
Nucleic Acids Res ; 49(D1): D803-D808, 2021 01 08.
Article En | MEDLINE | ID: mdl-33313828

Protein subcellular localization (SCL) is important for understanding protein function, genome annotation, and aids identification of potential cell surface diagnostic markers, drug targets, or vaccine components. PSORTdb comprises ePSORTdb, a manually curated database of experimentally verified protein SCLs, and cPSORTdb, a pre-computed database of PSORTb-predicted SCLs for NCBI's RefSeq deduced bacterial and archaeal proteomes. We now report PSORTdb 4.0 (http://db.psort.org/). It features a website refresh, in particular a more user-friendly database search. It also addresses the need to uniquely identify proteins from NCBI genomes now that GI numbers have been retired. It further expands both ePSORTdb and cPSORTdb, including additional data about novel secondary localizations, such as proteins found in bacterial outer membrane vesicles. Protein predictions in cPSORTdb have increased along with the number of available microbial genomes, from approximately 13 million when PSORTdb 3.0 was released, to over 66 million currently. Now, analyses of both complete and draft genomes are included. This expanded database will be of wide use to researchers developing SCL predictors or studying diverse microbes, including medically, agriculturally and industrially important species that have both classic or atypical cell envelope structures or vesicles.


Archaeal Proteins/metabolism , Bacterial Proteins/metabolism , Databases, Protein , Amino Acid Sequence , Archaeal Proteins/chemistry , Bacterial Proteins/chemistry , Cell Wall/chemistry , Protein Transport , Subcellular Fractions/metabolism , User-Computer Interface
4.
Bioinformatics ; 36(10): 3043-3048, 2020 05 01.
Article En | MEDLINE | ID: mdl-32108861

MOTIVATION: Many methods for microbial protein subcellular localization (SCL) prediction exist; however, none is readily available for analysis of metagenomic sequence data, despite growing interest from researchers studying microbial communities in humans, agri-food relevant organisms and in other environments (e.g. for identification of cell-surface biomarkers for rapid protein-based diagnostic tests). We wished to also identify new markers of water quality from freshwater samples collected from pristine versus pollution-impacted watersheds. RESULTS: We report PSORTm, the first bioinformatics tool designed for prediction of diverse bacterial and archaeal protein SCL from metagenomics data. PSORTm incorporates components of PSORTb, one of the most precise and widely used protein SCL predictors, with an automated classification by cell envelope. An evaluation using 5-fold cross-validation with in silico-fragmented sequences with known localization showed that PSORTm maintains PSORTb's high precision, while sensitivity increases proportionately with metagenomic sequence fragment length. PSORTm's read-based analysis was similar to PSORTb-based analysis of metagenome-assembled genomes (MAGs); however, the latter requires non-trivial manual classification of each MAG by cell envelope, and cannot make use of unassembled sequences. Analysis of the watershed samples revealed the importance of normalization and identified potential biomarkers of water quality. This method should be useful for examining a wide range of microbial communities, including human microbiomes, and other microbiomes of medical, environmental or industrial importance. AVAILABILITY AND IMPLEMENTATION: Documentation, source code and docker containers are available for running PSORTm locally at https://www.psort.org/psortm/ (freely available, open-source software under GNU General Public License Version 3). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Archaea , Metagenomics , Archaea/genetics , Bacteria/genetics , Humans , Metagenome , Software
5.
Nucleic Acids Res ; 45(W1): W30-W35, 2017 07 03.
Article En | MEDLINE | ID: mdl-28472413

IslandViewer (http://www.pathogenomics.sfu.ca/islandviewer/) is a widely-used webserver for the prediction and interactive visualization of genomic islands (GIs, regions of probable horizontal origin) in bacterial and archaeal genomes. GIs disproportionately encode factors that enhance the adaptability and competitiveness of the microbe within a niche, including virulence factors and other medically or environmentally important adaptations. We report here the release of IslandViewer 4, with novel features to accommodate the needs of larger-scale microbial genomics analysis, while expanding GI predictions and improving its flexible visualization interface. A user management web interface as well as an HTTP API for batch analyses are now provided with a secured authentication to facilitate the submission of larger numbers of genomes and the retrieval of results. In addition, IslandViewer's integrated GI predictions from multiple methods have been improved and expanded by integrating the precise Islander method for pre-computed genomes, as well as an updated IslandPath-DIMOB for both pre-computed and user-supplied custom genome analysis. Finally, pre-computed predictions including virulence factors and antimicrobial resistance are now available for 6193 complete bacterial and archaeal strains publicly available in RefSeq. IslandViewer 4 provides key enhancements to facilitate the analysis of GIs and better understand their role in the evolution of successful environmental microbes and pathogens.


Genome, Archaeal , Genome, Bacterial , Genomic Islands , Software , Datasets as Topic , Genes, Archaeal , Genes, Bacterial , Genomics , Internet , User-Computer Interface
6.
Nature ; 488(7409): 49-56, 2012 Aug 02.
Article En | MEDLINE | ID: mdl-22832581

Medulloblastoma, the most common malignant paediatric brain tumour, is currently treated with nonspecific cytotoxic therapies including surgery, whole-brain radiation, and aggressive chemotherapy. As medulloblastoma exhibits marked intertumoural heterogeneity, with at least four distinct molecular variants, previous attempts to identify targets for therapy have been underpowered because of small samples sizes. Here we report somatic copy number aberrations (SCNAs) in 1,087 unique medulloblastomas. SCNAs are common in medulloblastoma, and are predominantly subgroup-enriched. The most common region of focal copy number gain is a tandem duplication of SNCAIP, a gene associated with Parkinson's disease, which is exquisitely restricted to Group 4α. Recurrent translocations of PVT1, including PVT1-MYC and PVT1-NDRG1, that arise through chromothripsis are restricted to Group 3. Numerous targetable SCNAs, including recurrent events targeting TGF-ß signalling in Group 3, and NF-κB signalling in Group 4, suggest future avenues for rational, targeted therapy.


Cerebellar Neoplasms/classification , Cerebellar Neoplasms/genetics , Genome, Human/genetics , Genomic Structural Variation/genetics , Medulloblastoma/classification , Medulloblastoma/genetics , Carrier Proteins/genetics , Cerebellar Neoplasms/metabolism , Child , DNA Copy Number Variations/genetics , Gene Duplication/genetics , Genes, myc/genetics , Genomics , Hedgehog Proteins/metabolism , Humans , Medulloblastoma/metabolism , NF-kappa B/metabolism , Nerve Tissue Proteins/genetics , Oncogene Proteins, Fusion/genetics , Proteins/genetics , RNA, Long Noncoding , Signal Transduction , Transforming Growth Factor beta/metabolism , Translocation, Genetic/genetics
7.
Nucleic Acids Res ; 39(Database issue): D28-31, 2011 Jan.
Article En | MEDLINE | ID: mdl-20972220

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe's primary nucleotide-sequence repository. The ENA consists of three main databases: the Sequence Read Archive (SRA), the Trace Archive and EMBL-Bank. The objective of ENA is to support and promote the use of nucleotide sequencing as an experimental research platform by providing data submission, archive, search and download services. In this article, we outline these services and describe major changes and improvements introduced during 2010. These include extended EMBL-Bank and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome Archive (EGA) through SRA, and the launch of a new sequence similarity search service.


Base Sequence , Databases, Nucleic Acid , Europe , High-Throughput Nucleotide Sequencing , Molecular Sequence Annotation
8.
Nucleic Acids Res ; 38(Database issue): D39-45, 2010 Jan.
Article En | MEDLINE | ID: mdl-19906712

The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe's primary nucleotide sequence archival resource, safeguarding open nucleotide data access, engaging in worldwide collaborative data exchange and integrating with the scientific publication process. ENA has made significant contributions to the collaborative nucleotide archival arena as an active proponent of extending the traditional collaboration to cover capillary and next-generation sequencing information. We have continued to co-develop data and metadata representation formats with our collaborators for both data exchange and public data dissemination. In addition to the DDBJ/EMBL/GenBank feature table format, we share metadata formats for capillary and next-generation sequencing traces and are using and contributing to the NCBI SRA Toolkit for the long-term storage of the next-generation sequence traces. During the course of 2009, ENA has significantly improved sequence submission, search and access functionalities provided at EMBL-EBI. In this article, we briefly describe the content and scope of our archive and introduce major improvements to our services.


Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Access to Information , Algorithms , Animals , Computational Biology/trends , DNA/genetics , Europe , Humans , Information Storage and Retrieval/methods , Internet , Software
9.
Nucleic Acids Res ; 37(Database issue): D19-25, 2009 Jan.
Article En | MEDLINE | ID: mdl-18978013

Dramatic increases in the throughput of nucleotide sequencing machines, and the promise of ever greater performance, have thrust bioinformatics into the era of petabyte-scale data sets. Sequence repositories, which provide the feed for these data sets into the worldwide computational infrastructure, are challenged by the impact of these data volumes. The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/embl), comprising the EMBL Nucleotide Sequence Database and the Ensembl Trace Archive, has identified challenges in the storage, movement, analysis, interpretation and visualization of petabyte-scale data sets. We present here our new repository for next generation sequence data, a brief summary of contents of the ENA and provide details of major developments to submission pipelines, high-throughput rule-based validation infrastructure and data integration approaches.


Databases, Nucleic Acid , Sequence Analysis/trends , Internet , Systems Integration
10.
Nucleic Acids Res ; 36(Database issue): D5-12, 2008 Jan.
Article En | MEDLINE | ID: mdl-18039715

The Ensembl Trace Archive (http://trace.ensembl.org/) and the EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/), known together as the European Nucleotide Archive, continue to see growth in data volume and diversity. Selected major developments of 2007 are presented briefly, along with data submission and retrieval information. In the face of increasing requirements for nucleotide trace, sequence and annotation data archiving, data capture priority decisions have been taken at the European Nucleotide Archive. Priorities are discussed in terms of how reliably information can be captured, the long-term benefits of its capture and the ease with which it can be captured.


Databases, Nucleic Acid , Sequence Analysis, DNA , Animals , Archives , Genomics , Internet
11.
Nucleic Acids Res ; 35(Database issue): D16-20, 2007 Jan.
Article En | MEDLINE | ID: mdl-17148479

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl) at the EMBL European Bioinformatics Institute, UK, offers a large and freely accessible collection of nucleotide sequences and accompanying annotation. The database is maintained in collaboration with DDBJ and GenBank. Data are exchanged between the collaborating databases on a daily basis to achieve optimal synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation, alignments and bulk data. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. In 2006, the volume of data has continued to grow exponentially. Access to the data is provided via SRS, ftp and variety of other methods. Extensive external and internal cross-references enable users to search for related information across other databases and within the database. All available resources can be accessed via the EBI home page at http://www.ebi.ac.uk/. Changes over the past year include changes to the file format, further development of the EMBLCDS dataset and developments to the XML format.


Databases, Nucleic Acid , Base Sequence , Databases, Nucleic Acid/trends , Internet , User-Computer Interface
...