Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 94
Filtrar
1.
Microb Genom ; 10(2)2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38358325

RESUMO

The COVID-19 pandemic has seen large-scale pathogen genomic sequencing efforts, becoming part of the toolbox for surveillance and epidemic research. This resulted in an unprecedented level of data sharing to open repositories, which has actively supported the identification of SARS-CoV-2 structure, molecular interactions, mutations and variants, and facilitated vaccine development and drug reuse studies and design. The European COVID-19 Data Platform was launched to support this data sharing, and has resulted in the deposition of several million SARS-CoV-2 raw reads. In this paper we describe (1) open data sharing, (2) tools for submission, analysis, visualisation and data claiming (e.g. ORCiD), (3) the systematic analysis of these datasets, at scale via the SARS-CoV-2 Data Hubs as well as (4) lessons learnt. This paper describes a component of the Platform, the SARS-CoV-2 Data Hubs, which enable the extension and set up of infrastructure that we intend to use more widely in the future for pathogen surveillance and pandemic preparedness.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Pandemias , COVID-19/epidemiologia , Genômica , Disseminação de Informação
2.
Nucleic Acids Res ; 52(D1): D92-D97, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37956313

RESUMO

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena) is maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). The ENA is one of the three members of the International Nucleotide Sequence Database Collaboration (INSDC). It serves the bioinformatics community worldwide via the submission, processing, archiving and dissemination of sequence data. The ENA supports data types ranging from raw reads, through alignments and assemblies to functional annotation. The data is enriched with contextual information relating to samples and experimental configurations. In this article, we describe recent progress and improvements to ENA services. In particular, we focus upon three areas of work in 2023: FAIRness of ENA data, pandemic preparedness and foundational technology. For FAIRness, we have introduced minimal requirements for spatiotemporal annotation, created a metadata-based classification system, incorporated third party metadata curations with archived records, and developed a new rapid visualisation platform, the ENA Notebooks. For foundational enhancements, we have improved the INSDC data exchange and synchronisation pipelines, and invested in site reliability engineering for ENA infrastructure. In order to support genomic surveillance efforts, we have continued to provide ENA services in support of SARS-CoV-2 data mobilisation and have adapted these for broader pathogen surveillance efforts.


Assuntos
Genômica , Nucleotídeos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Internet , Reprodutibilidade dos Testes , Europa (Continente)
3.
Microb Genom ; 9(12)2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38085797

RESUMO

Fast, efficient public health actions require well-organized and coordinated systems that can supply timely and accurate knowledge. Public databases of pathogen genomic data, such as the International Nucleotide Sequence Database Collaboration (INSDC), have become essential tools for efficient public health decisions. However, these international resources began primarily for academic purposes, rather than for surveillance or interventions. Now, queries need to access not only the whole genomes of multiple pathogens but also make connections using robust contextual metadata to identify issues of public health relevance. Databases that over time developed a patchwork of submission formats and requirements need to be consistently organized and coordinated internationally to allow effective searches.To help resolve these issues, we propose a common pathogen data structure called the Pathogen Data Object Model (DOM) that will formalize the minimum pieces of sequence data and contextual data necessary for general public health uses, while recognizing that submitters will likely withhold a wide range of non-public contextual data. Further, we propose contributors use the Pathogen DOM for all pathogen submissions (bacterial, viral, fungal, and parasites), which will simplify data submissions and provide a consistent and transparent data structure for downstream data analyses. We also highlight how improved submission tools can support the Pathogen DOM, offering users additional easy-to-use methods to ensure this structure is followed.


Assuntos
Nucleotídeos , Saúde Pública , Sequência de Bases , Genômica/métodos , Bases de Dados de Ácidos Nucleicos
4.
J Mol Biol ; 435(14): 168016, 2023 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-36806692

RESUMO

An increasingly common output arising from the analysis of shotgun metagenomic datasets is the generation of metagenome-assembled genomes (MAGs), with tens of thousands of MAGs now described in the literature. However, the discovery and comparison of these MAG collections is hampered by the lack of uniformity in their generation, annotation and storage. To address this, we have developed MGnify Genomes, a growing collection of biome-specific non-redundant microbial genome catalogues generated using MAGs and publicly available isolate genomes. Genomes within a biome-specific catalogue are organised into species clusters. For species that contain multiple conspecific genomes, the highest quality genome is selected as the representative, always prioritising an isolate genome over a MAG. The species representative sequences and annotations can be visualised on the MGnify website and the full catalogue and associated analysis outputs can be downloaded from MGnify servers. A suite of online search tools is provided allowing users to compare their own sequences, ranging from a gene to sets of genomes, against the catalogues. Seven biomes are available currently, comprising over 300,000 genomes that represent 11,048 non-redundant species, and include 36 taxonomic classes not currently represented by cultured genomes. MGnify Genomes is available at https://www.ebi.ac.uk/metagenomics/browse/genomes/.


Assuntos
Genoma Microbiano , Metagenoma , Metagenoma/genética , Metagenômica
5.
Nucleic Acids Res ; 51(D1): D121-D125, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36399492

RESUMO

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena), maintained by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), offers those producing data an open and supported platform for the management, archiving, publication, and dissemination of data; and to the scientific community as a whole, it offers a globally comprehensive data set through a host of data discovery and retrieval tools. Here, we describe recent updates to the ENA's submission and retrieval services as well as focused efforts to improve connectivity, reusability, and interoperability of ENA data and metadata.


Assuntos
Bases de Dados de Ácidos Nucleicos , Academias e Institutos , Biologia Computacional , Internet , Software , Conjuntos de Dados como Assunto
6.
Nucleic Acids Res ; 51(D1): D753-D759, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36477304

RESUMO

The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment.


Assuntos
Microbiota , Análise de Sequência , Genômica/métodos , Metagenoma , Metagenômica/métodos , Microbiota/genética , Software , Análise de Sequência/métodos
7.
Commun Biol ; 5(1): 1217, 2022 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-36400841

RESUMO

Understanding the myriad pathways by which antimicrobial-resistance genes (ARGs) spread across biomes is necessary to counteract the global menace of antimicrobial resistance. We screened 17939 assembled metagenomic samples covering 21 biomes, differing in sequencing quality and depth, unevenly across 46 countries, 6 continents, and 14 years (2005-2019) for clinically crucial ARGs, mobile colistin resistance (mcr), carbapenem resistance (CR), and (extended-spectrum) beta-lactamase (ESBL and BL) genes. These ARGs were most frequent in human gut, oral and skin biomes, followed by anthropogenic (wastewater, bioreactor, compost, food), and natural biomes (freshwater, marine, sediment). Mcr-9 was the most prevalent mcr gene, spatially and temporally; blaOXA-233 and blaTEM-1 were the most prevalent CR and BL/ESBL genes, but blaGES-2 and blaTEM-116 showed the widest distribution. Redundancy analysis and Bayesian analysis showed ARG distribution was non-random and best-explained by potential host genera and biomes, followed by collection year, anthropogenic factors and collection countries. Preferential ARG occurrence, and potential transmission, between characteristically similar biomes indicate strong ecological boundaries. Our results provide a high-resolution global map of ARG distribution and importantly, identify checkpoint biomes wherein interventions aimed at disrupting ARGs dissemination are likely to be most effective in reducing dissemination and in the long term, the ARG global burden.


Assuntos
Antibacterianos , Microbiota , Humanos , Antibacterianos/farmacologia , Farmacorresistência Bacteriana/genética , Teorema de Bayes , Microbiota/genética , Genes Bacterianos
8.
Science ; 376(6589): 156-162, 2022 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-35389782

RESUMO

Whereas DNA viruses are known to be abundant, diverse, and commonly key ecosystem players, RNA viruses are insufficiently studied outside disease settings. In this study, we analyzed ≈28 terabases of Global Ocean RNA sequences to expand Earth's RNA virus catalogs and their taxonomy, investigate their evolutionary origins, and assess their marine biogeography from pole to pole. Using new approaches to optimize discovery and classification, we identified RNA viruses that necessitate substantive revisions of taxonomy (doubling phyla and adding >50% new classes) and evolutionary understanding. "Species"-rank abundance determination revealed that viruses of the new phyla "Taraviricota," a missing link in early RNA virus evolution, and "Arctiviricota" are widespread and dominant in the oceans. These efforts provide foundational knowledge critical to integrating RNA viruses into ecological and epidemiological models.


Assuntos
Genoma Viral , Vírus de RNA , Vírus , Evolução Biológica , Ecossistema , Oceanos e Mares , Filogenia , RNA , Vírus de RNA/genética , Viroma/genética , Vírus/genética
10.
Nucleic Acids Res ; 50(D1): D106-D110, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850158

RESUMO

The European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena), maintained at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) provides freely accessible services, both for deposition of, and access to, open nucleotide sequencing data. Open scientific data are of paramount importance to the scientific community and contribute daily to the acceleration of scientific advance. Here, we outline the major updates to ENA's services and infrastructure that have been delivered over the past year.


Assuntos
Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Nucleotídeos/genética , Software , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Anotação de Sequência Molecular , Nucleotídeos/classificação
11.
Gigascience ; 10(12)2021 12 29.
Artigo em Inglês | MEDLINE | ID: mdl-34966925

RESUMO

BACKGROUND: Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSD provenance, scientific use, and reuse in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, it becomes possible to assess the contribution of NSD to infer trends in scientific knowledge gain at the global level. FINDINGS: We extracted and linked records from the European Nucleotide Archive to citations in open-access publications aggregated at Europe PubMed Central. A total of 8,464,292 ENA accessions with geographical provenance information were associated with publications. We conducted a data quality review to uncover potential issues in publication citation information extraction and author affiliation tagging and developed and implemented best-practice recommendations for citation extraction. We constructed flat data tables and a data warehouse with an interactive web application to enable ad hoc exploration of NSD use and summary statistics. CONCLUSIONS: The extraction and linking of NSD with associated publication citations enables transparency. The quality review contributes to enhanced text mining methods for identifier extraction and use. Furthermore, the global provision and use of NSD enable scientists worldwide to join literature and sequence databases in a multidimensional fashion. As a concrete use case, we visualized statistics of country clusters concerning NSD access in the context of discussions around digital sequence information under the United Nations Convention on Biological Diversity.


Assuntos
Mineração de Dados , Nucleotídeos , Sequência de Bases , Bases de Dados de Ácidos Nucleicos , Europa (Continente)
12.
Gigascience ; 10(12)2021 12 29.
Artigo em Inglês | MEDLINE | ID: mdl-34966927

RESUMO

BACKGROUND: The United Nations Convention on Biological Diversity (CBD) formally recognized the sovereign rights of nations over their biological diversity. Implicit within the treaty is the idea that mega-biodiverse countries will provide genetic resources and grant access to them and scientists in high-income countries will use these resources and share back benefits. However, little research has been conducted on how this framework is reflected in real-life scientific practice. RESULT: Currently, parties to the CBD are debating whether digital sequence information (DSI) should be regulated under a new benefit-sharing framework. At this critical time point in the upcoming international negotiations, we test the fundamental hypothesis of provision and use of DSI by looking at the global patterns of access and use in scientific publications. CONCLUSION: Our data reject the provider-user relationship and suggest a far more complex information flow for DSI. Therefore, any new policy decisions on DSI should be aware of the high level of use of DSI across low- and middle-income countries and seek to preserve open access to this crucial common good.


Assuntos
Biodiversidade , Cooperação Internacional
13.
Front Genet ; 12: 639238, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34220930

RESUMO

The Functional Annotation of ANimal Genomes (FAANG) project is a worldwide coordinated action creating high-quality functional annotation of farmed and companion animal genomes. The generation of a rich genome-to-phenome resource and supporting informatic infrastructure advances the scope of comparative genomics and furthers the understanding of functional elements. The project also provides terrestrial and aquatic animal agriculture community powerful resources for supporting improvements to farmed animal production, disease resistance, and genetic diversity. The FAANG Data Portal (https://data.faang.org) ensures Findable, Accessible, Interoperable and Reusable (FAIR) open access to the wealth of sample, sequencing, and analysis data produced by an ever-growing number of FAANG consortia. It is developed and maintained by the FAANG Data Coordination Centre (DCC) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI). FAANG projects produce a standardised set of multi-omic assays with resulting data placed into a range of specialised open data archives. To ensure this data is easily findable and accessible by the community, the portal automatically identifies and collates all submitted FAANG data into a single easily searchable resource. The Data Portal supports direct download from the multiple underlying archives to enable seamless access to all FAANG data from within the portal itself. The portal provides a range of predefined filters, powerful predictive search, and a catalogue of sampling and analysis protocols and automatically identifies publications associated with any dataset. To ensure all FAANG data submissions are high-quality, the portal includes powerful contextual metadata validation and data submissions brokering to the underlying EMBL-EBI archives. The portal will incorporate extensive new technical infrastructure to effectively deliver and standardise FAANG's shift to single-cellomics, cell atlases, pangenomes, and novel phenotypic prediction models. The Data Portal plays a key role for FAANG by supporting high-quality functional annotation of animal genomes, through open FAIR sharing of data, complete with standardised rich metadata. Future Data Portal features developed by the DCC will support new technological developments for continued improvement for FAANG projects.

14.
Nucleic Acids Res ; 49(W1): W619-W623, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-34048576

RESUMO

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic will be remembered as one of the defining events of the 21st century. The rapid global outbreak has had significant impacts on human society and is already responsible for millions of deaths. Understanding and tackling the impact of the virus has required a worldwide mobilisation and coordination of scientific research. The COVID-19 Data Portal (https://www.covid19dataportal.org/) was first released as part of the European COVID-19 Data Platform, on April 20th 2020 to facilitate rapid and open data sharing and analysis, to accelerate global SARS-CoV-2 and COVID-19 research. The COVID-19 Data Portal has fortnightly feature releases to continue to add new data types, search options, visualisations and improvements based on user feedback and research. The open datasets and intuitive suite of search, identification and download services, represent a truly FAIR (Findable, Accessible, Interoperable and Reusable) resource that enables researchers to easily identify and quickly obtain the key datasets needed for their COVID-19 research.


Assuntos
Pesquisa Biomédica , COVID-19 , Bases de Dados Factuais , Conjuntos de Dados como Assunto , Disseminação de Informação , Publicação de Acesso Aberto , SARS-CoV-2 , COVID-19/epidemiologia , COVID-19/genética , COVID-19/virologia , Bases de Dados Bibliográficas , Surtos de Doenças , Humanos , Pandemias , SARS-CoV-2/química , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/ultraestrutura , Fatores de Tempo , Proteínas Virais/química , Proteínas Virais/genética
15.
PLoS One ; 16(1): e0245475, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33476328

RESUMO

INTRODUCTION: Depression, cardiovascular diseases and diabetes are among the major non-communicable diseases, leading to significant disability and mortality worldwide. These diseases may share environmental and genetic determinants associated with multimorbid patterns. Stressful early-life events are among the primary factors associated with the development of mental and physical diseases. However, possible causative mechanisms linking early life stress (ELS) with psycho-cardio-metabolic (PCM) multi-morbidity are not well understood. This prevents a full understanding of causal pathways towards the shared risk of these diseases and the development of coordinated preventive and therapeutic interventions. METHODS AND ANALYSIS: This paper describes the study protocol for EarlyCause, a large-scale and inter-disciplinary research project funded by the European Union's Horizon 2020 research and innovation programme. The project takes advantage of human longitudinal birth cohort data, animal studies and cellular models to test the hypothesis of shared mechanisms and molecular pathways by which ELS shapes an individual's physical and mental health in adulthood. The study will research in detail how ELS converts into biological signals embedded simultaneously or sequentially in the brain, the cardiovascular and metabolic systems. The research will mainly focus on four biological processes including possible alterations of the epigenome, neuroendocrine system, inflammatome, and the gut microbiome. Life-course models will integrate the role of modifying factors as sex, socioeconomics, and lifestyle with the goal to better identify groups at risk as well as inform promising strategies to reverse the possible mechanisms and/or reduce the impact of ELS on multi-morbidity development in high-risk individuals. These strategies will help better manage the impact of multi-morbidity on human health and the associated risk.


Assuntos
Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/etiologia , Depressão/epidemiologia , Depressão/etiologia , Diabetes Mellitus/epidemiologia , Diabetes Mellitus/etiologia , Estresse Psicológico/complicações , Adulto , Experiências Adversas da Infância/psicologia , Biomarcadores/metabolismo , Doenças Cardiovasculares/metabolismo , Doenças Cardiovasculares/psicologia , Criança , Depressão/metabolismo , Depressão/psicologia , Diabetes Mellitus/metabolismo , Diabetes Mellitus/psicologia , Meio Ambiente , Humanos , Estudos Longitudinais , Morbidade , Fatores de Risco
16.
Nucleic Acids Res ; 49(D1): D29-D37, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33245775

RESUMO

The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effectively tackle a global health crisis. EMBL-EBI has been able to build on its position to contribute to the fight against COVID-19 in a number of ways. Firstly, EMBL-EBI has used its infrastructure, expertise and network of international collaborations to help build the European COVID-19 Data Platform (https://www.covid19dataportal.org/), which brings together COVID-19 biomolecular data and connects it to researchers, clinicians and public health professionals. By September 2020, the COVID-19 Data Platform has integrated in excess of 170 000 COVID-19 biomolecular data and literature records, collected through a number of EMBL-EBI resources. Secondly, EMBL-EBI has strived to continue its support of the life science communities through the crisis, with updated Training provision and improved service provision throughout its resources. The COVID-19 pandemic has highlighted the importance of EMBL-EBI's core principles, including international cooperation, resource sharing and central data brokering, and has further empowered scientific cooperation.


Assuntos
COVID-19/prevenção & controle , Biologia Computacional/estatística & dados numéricos , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Armazenamento e Recuperação da Informação/métodos , SARS-CoV-2/genética , Proteínas Virais/genética , COVID-19/epidemiologia , COVID-19/virologia , Biologia Computacional/métodos , Biologia Computacional/organização & administração , Bases de Dados de Ácidos Nucleicos/organização & administração , Saúde Global , Humanos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Internet , Pandemias , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Proteínas Virais/metabolismo
17.
Nucleic Acids Res ; 49(D1): D82-D85, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33175160

RESUMO

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena), provided by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), has for almost forty years continued in its mission to freely archive and present the world's public sequencing data for the benefit of the entire scientific community and for the acceleration of the global research effort. Here we highlight the major developments to ENA services and content in 2020, focussing in particular on the recently released updated ENA browser, modernisation of our release process and our data coordination collaborations with specific research communities.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos/tendências , Ácidos Nucleicos/genética , Nucleotídeos/genética , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Europa (Continente) , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Anotação de Sequência Molecular , Ácidos Nucleicos/química , Nucleotídeos/química , Análise de Sequência de DNA , Análise de Sequência de RNA
18.
Nucleic Acids Res ; 49(D1): D121-D124, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33166387

RESUMO

The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org/) has been the core infrastructure for collecting and providing nucleotide sequence data and metadata for >30 years. Three partner organizations, the DNA Data Bank of Japan (DDBJ) at the National Institute of Genetics in Mishima, Japan; the European Nucleotide Archive (ENA) at the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK; and GenBank at National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health in Bethesda, Maryland, USA have been collaboratively maintaining the INSDC for the benefit of not only science but all types of community worldwide.


Assuntos
Bases de Dados de Ácidos Nucleicos , Metadados/estatística & dados numéricos , Nucleotídeos/genética , Análise de Sequência de DNA/estatística & dados numéricos , Análise de Sequência de RNA/estatística & dados numéricos , Academias e Institutos , Sequência de Bases , Europa (Continente) , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Cooperação Internacional , Japão , Nucleotídeos/metabolismo , Estados Unidos
19.
Microb Genom ; 6(5)2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32255760

RESUMO

Antimicrobial resistance (AMR) is an emerging threat to modern medicine. Improved diagnostics and surveillance of resistant bacteria require the development of next-generation analysis tools and collaboration between international partners. Here, we present the 'AMR Data Hub', an online infrastructure for storage and sharing of structured phenotypic AMR data linked to bacterial whole-genome sequences. Leveraging infrastructure built by the European COMPARE Consortium and structured around the European Nucleotide Archive (ENA), the AMR Data Hub already provides an extensive data collection of more than 2500 isolates with linked genome and AMR data. Representing these data in standardized formats, we provide tools for the validation and submission of new data and services supporting search, browse and retrieval. The current collection was created through a collaboration by several partners from the European COMPARE Consortium, demonstrating the capacities and utility of the AMR Data Hub and its associated tools. We anticipate growth of content and offer the hub as a basis for future research into methods to explore and predict AMR.


Assuntos
Antibacterianos/farmacologia , Bactérias/genética , Farmacorresistência Bacteriana , Sequenciamento Completo do Genoma/métodos , Bactérias/efeitos dos fármacos , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Fenótipo
20.
G3 (Bethesda) ; 10(4): 1361-1374, 2020 04 09.
Artigo em Inglês | MEDLINE | ID: mdl-32071071

RESUMO

Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded by contaminant DNA. Whether introduced during sample processing or through co-extraction alongside the target DNA, if insufficient care is taken during the assembly process, the final assembled genome may be a mixture of data from several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, may be included in downstream analyses by users unaware of underlying problems. We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility. We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at https://blobtoolkit.genomehubs.org/view We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.


Assuntos
Genoma , Software , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA