Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Commun Biol ; 5(1): 1217, 2022 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-36400841

RESUMEN

Understanding the myriad pathways by which antimicrobial-resistance genes (ARGs) spread across biomes is necessary to counteract the global menace of antimicrobial resistance. We screened 17939 assembled metagenomic samples covering 21 biomes, differing in sequencing quality and depth, unevenly across 46 countries, 6 continents, and 14 years (2005-2019) for clinically crucial ARGs, mobile colistin resistance (mcr), carbapenem resistance (CR), and (extended-spectrum) beta-lactamase (ESBL and BL) genes. These ARGs were most frequent in human gut, oral and skin biomes, followed by anthropogenic (wastewater, bioreactor, compost, food), and natural biomes (freshwater, marine, sediment). Mcr-9 was the most prevalent mcr gene, spatially and temporally; blaOXA-233 and blaTEM-1 were the most prevalent CR and BL/ESBL genes, but blaGES-2 and blaTEM-116 showed the widest distribution. Redundancy analysis and Bayesian analysis showed ARG distribution was non-random and best-explained by potential host genera and biomes, followed by collection year, anthropogenic factors and collection countries. Preferential ARG occurrence, and potential transmission, between characteristically similar biomes indicate strong ecological boundaries. Our results provide a high-resolution global map of ARG distribution and importantly, identify checkpoint biomes wherein interventions aimed at disrupting ARGs dissemination are likely to be most effective in reducing dissemination and in the long term, the ARG global burden.


Asunto(s)
Antibacterianos , Microbiota , Humanos , Antibacterianos/farmacología , Farmacorresistencia Bacteriana/genética , Teorema de Bayes , Microbiota/genética , Genes Bacterianos
2.
Gigascience ; 10(12)2021 12 29.
Artículo en Inglés | MEDLINE | ID: mdl-34966925

RESUMEN

BACKGROUND: Linking nucleotide sequence data (NSD) to scientific publication citations can enhance understanding of NSD provenance, scientific use, and reuse in the community. By connecting publications with NSD records, NSD geographical provenance information, and author geographical information, it becomes possible to assess the contribution of NSD to infer trends in scientific knowledge gain at the global level. FINDINGS: We extracted and linked records from the European Nucleotide Archive to citations in open-access publications aggregated at Europe PubMed Central. A total of 8,464,292 ENA accessions with geographical provenance information were associated with publications. We conducted a data quality review to uncover potential issues in publication citation information extraction and author affiliation tagging and developed and implemented best-practice recommendations for citation extraction. We constructed flat data tables and a data warehouse with an interactive web application to enable ad hoc exploration of NSD use and summary statistics. CONCLUSIONS: The extraction and linking of NSD with associated publication citations enables transparency. The quality review contributes to enhanced text mining methods for identifier extraction and use. Furthermore, the global provision and use of NSD enable scientists worldwide to join literature and sequence databases in a multidimensional fashion. As a concrete use case, we visualized statistics of country clusters concerning NSD access in the context of discussions around digital sequence information under the United Nations Convention on Biological Diversity.


Asunto(s)
Minería de Datos , Nucleótidos , Secuencia de Bases , Bases de Datos de Ácidos Nucleicos , Europa (Continente)
3.
PLoS Biol ; 19(11): e3001421, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34752446

RESUMEN

The open sharing of genomic data provides an incredibly rich resource for the study of bacterial evolution and function and even anthropogenic activities such as the widespread use of antimicrobials. However, these data consist of genomes assembled with different tools and levels of quality checking, and of large volumes of completely unprocessed raw sequence data. In both cases, considerable computational effort is required before biological questions can be addressed. Here, we assembled and characterised 661,405 bacterial genomes retrieved from the European Nucleotide Archive (ENA) in November of 2018 using a uniform standardised approach. Of these, 311,006 did not previously have an assembly. We produced a searchable COmpact Bit-sliced Signature (COBS) index, facilitating the easy interrogation of the entire dataset for a specific sequence (e.g., gene, mutation, or plasmid). Additional MinHash and pp-sketch indices support genome-wide comparisons and estimations of genomic distance. Combined, this resource will allow data to be easily subset and searched, phylogenetic relationships between genomes to be quickly elucidated, and hypotheses rapidly generated and tested. We believe that this combination of uniform processing and variety of search/filter functionalities will make this a resource of very wide utility. In terms of diversity within the data, a breakdown of the 639,981 high-quality genomes emphasised the uneven species composition of the ENA/public databases, with just 20 of the total 2,336 species making up 90% of the genomes. The overrepresented species tend to be acute/common human pathogens, aligning with research priorities at different levels from individual interests to funding bodies and national and global public health agencies.


Asunto(s)
Bacterias/genética , Biodiversidad , ADN Bacteriano/genética , Curaduría de Datos , Secuencia de Bases , Farmacorresistencia Bacteriana/genética , Especificidad de la Especie
4.
Nucleic Acids Res ; 49(D1): D82-D85, 2021 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-33175160

RESUMEN

The European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena), provided by the European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI), has for almost forty years continued in its mission to freely archive and present the world's public sequencing data for the benefit of the entire scientific community and for the acceleration of the global research effort. Here we highlight the major developments to ENA services and content in 2020, focussing in particular on the recently released updated ENA browser, modernisation of our release process and our data coordination collaborations with specific research communities.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos/tendencias , Ácidos Nucleicos/genética , Nucleótidos/genética , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Europa (Continente) , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , Anotación de Secuencia Molecular , Ácidos Nucleicos/química , Nucleótidos/química , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN
5.
iScience ; 23(1): 100769, 2020 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-31887656

RESUMEN

Despite rapid advances in whole genome sequencing (WGS) technologies, their integration into routine microbiological diagnostics has been hampered by the lack of standardized downstream bioinformatics analysis. We developed a comprehensive and computationally low-resource bioinformatics pipeline (BacPipe) enabling direct analyses of bacterial whole-genome sequences (raw reads or contigs) obtained from second- or third-generation sequencing technologies. A graphical user interface was developed to visualize real-time progression of the analysis. The scalability and speed of BacPipe in handling large datasets was demonstrated using 4,139 Illumina paired-end sequence files of publicly available bacterial genomes (2.9-5.4 Mb) from the European Nucleotide Archive. BacPipe is integrated in EBI-SELECTA, a project-specific portal (H2020-COMPARE), and is available as an independent docker image that can be used across Windows- and Unix-based systems. BacPipe offers a fully automated "one-stop" bacterial WGS analysis pipeline to overcome the major hurdle of WGS data analysis in hospitals and public-health and for infection control monitoring.

6.
Nucleic Acids Res ; 48(D1): D70-D76, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31722421

RESUMEN

The European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena) at the European Molecular Biology Laboratory's European Bioinformatics Institute provides open and freely available data deposition and access services across the spectrum of nucleotide sequence data types. Making the world's public sequencing datasets available to the scientific community, the ENA represents a globally comprehensive nucleotide sequence resource. Here, we outline ENA services and content in 2019 and provide an insight into selected key areas of development in this period.


Asunto(s)
Biología Computacional , Bases de Datos de Ácidos Nucleicos , Genómica , Biología Computacional/métodos , Europa (Continente) , Genómica/métodos , Anotación de Secuencia Molecular , Programas Informáticos , Interfaz Usuario-Computador , Navegador Web
7.
Database (Oxford) ; 20192019 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-31868882

RESUMEN

Data sharing enables research communities to exchange findings and build upon the knowledge that arises from their discoveries. Areas of public and animal health as well as food safety would benefit from rapid data sharing when it comes to emergencies. However, ethical, regulatory and institutional challenges, as well as lack of suitable platforms which provide an infrastructure for data sharing in structured formats, often lead to data not being shared or at most shared in form of supplementary materials in journal publications. Here, we describe an informatics platform that includes workflows for structured data storage, managing and pre-publication sharing of pathogen sequencing data and its analysis interpretations with relevant stakeholders.


Asunto(s)
Bases de Datos Factuales , Difusión de la Información , Bacterias/clasificación , Metagenómica , Filogenia , Interfaz Usuario-Computador
8.
PLoS Pathog ; 6(2): e1000784, 2010 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-20195509

RESUMEN

The heterochromatic environment and physical clustering of chromosome ends at the nuclear periphery provide a functional and structural framework for antigenic variation and evolution of subtelomeric virulence gene families in the malaria parasite Plasmodium falciparum. While recent studies assigned important roles for reversible histone modifications, silent information regulator 2 and heterochromatin protein 1 (PfHP1) in epigenetic control of variegated expression, factors involved in the recruitment and organization of subtelomeric heterochromatin remain unknown. Here, we describe the purification and characterization of PfSIP2, a member of the ApiAP2 family of putative transcription factors, as the unknown nuclear factor interacting specifically with cis-acting SPE2 motif arrays in subtelomeric domains. Interestingly, SPE2 is not bound by the full-length protein but rather by a 60kDa N-terminal domain, PfSIP2-N, which is released during schizogony. Our experimental re-definition of the SPE2/PfSIP2-N interaction highlights the strict requirement of both adjacent AP2 domains and a conserved bipartite SPE2 consensus motif for high-affinity binding. Genome-wide in silico mapping identified 777 putative binding sites, 94% of which cluster in heterochromatic domains upstream of subtelomeric var genes and in telomere-associated repeat elements. Immunofluorescence and chromatin immunoprecipitation (ChIP) assays revealed co-localization of PfSIP2-N with PfHP1 at chromosome ends. Genome-wide ChIP demonstrated the exclusive binding of PfSIP2-N to subtelomeric SPE2 landmarks in vivo but not to single chromosome-internal sites. Consistent with this specialized distribution pattern, PfSIP2-N over-expression has no effect on global gene transcription. Hence, contrary to the previously proposed role for this factor in gene activation, our results provide strong evidence for the first time for the involvement of an ApiAP2 factor in heterochromatin formation and genome integrity. These findings are highly relevant for our understanding of chromosome end biology and variegated expression in P. falciparum and other eukaryotes, and for the future analysis of the role of ApiAP2-DNA interactions in parasite biology.


Asunto(s)
Proteínas Cromosómicas no Histona/genética , Cromosomas/genética , Regulación de la Expresión Génica/genética , Plasmodium falciparum/genética , Proteínas Protozoarias/metabolismo , Factores de Transcripción/metabolismo , Southern Blotting , Western Blotting , Inmunoprecipitación de Cromatina , Homólogo de la Proteína Chromobox 5 , Ensayo de Cambio de Movilidad Electroforética , Técnica del Anticuerpo Fluorescente , Genes Protozoarios , Heterocromatina , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
9.
PLoS Pathog ; 5(9): e1000569, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19730695

RESUMEN

Epigenetic processes are the main conductors of phenotypic variation in eukaryotes. The malaria parasite Plasmodium falciparum employs antigenic variation of the major surface antigen PfEMP1, encoded by 60 var genes, to evade acquired immune responses. Antigenic variation of PfEMP1 occurs through in situ switches in mono-allelic var gene transcription, which is PfSIR2-dependent and associated with the presence of repressive H3K9me3 marks at silenced loci. Here, we show that P. falciparum heterochromatin protein 1 (PfHP1) binds specifically to H3K9me3 but not to other repressive histone methyl marks. Based on nuclear fractionation and detailed immuno-localization assays, PfHP1 constitutes a major component of heterochromatin in perinuclear chromosome end clusters. High-resolution genome-wide chromatin immuno-precipitation demonstrates the striking association of PfHP1 with virulence gene arrays in subtelomeric and chromosome-internal islands and a high correlation with previously mapped H3K9me3 marks. These include not only var genes, but also the majority of P. falciparum lineage-specific gene families coding for exported proteins involved in host-parasite interactions. In addition, we identified a number of PfHP1-bound genes that were not enriched in H3K9me3, many of which code for proteins expressed during invasion or at different life cycle stages. Interestingly, PfHP1 is absent from centromeric regions, implying important differences in centromere biology between P. falciparum and its human host. Over-expression of PfHP1 results in an enhancement of variegated expression and highlights the presence of well-defined heterochromatic boundaries. In summary, we identify PfHP1 as a major effector of virulence gene silencing and phenotypic variation. Our results are instrumental for our understanding of this widely used survival strategy in unicellular pathogens.


Asunto(s)
Proteínas Cromosómicas no Histona/genética , Plasmodium falciparum/genética , Proteínas Protozoarias/genética , Factores de Virulencia/genética , Animales , Núcleo Celular/metabolismo , Centrómero/metabolismo , Homólogo de la Proteína Chromobox 5 , Proteínas Cromosómicas no Histona/metabolismo , Cromosomas , Silenciador del Gen , Genoma de Protozoos , Familia de Multigenes , Análisis de Secuencia por Matrices de Oligonucleótidos , Fenotipo , Plasmodium falciparum/patogenicidad , Proteínas Protozoarias/metabolismo , Reproducibilidad de los Resultados , Factores de Virulencia/metabolismo
10.
Nucleic Acids Res ; 34(Web Server issue): W104-9, 2006 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-16844970

RESUMEN

Phylogenetic analysis and examination of protein domains allow accurate genome annotation and are invaluable to study proteins and protein complex evolution. However, two sequences can be homologous without sharing statistically significant amino acid or nucleotide identity, presenting a challenging bioinformatics problem. We present TreeDomViewer, a visualization tool available as a web-based interface that combines phylogenetic tree description, multiple sequence alignment and InterProScan data of sequences and generates a phylogenetic tree projecting the corresponding protein domain information onto the multiple sequence alignment. Thereby it makes use of existing domain prediction tools such as InterProScan. TreeDomViewer adopts an evolutionary perspective on how domain structure of two or more sequences can be aligned and compared, to subsequently infer the function of an unknown homolog. This provides insight into the function assignment of, in terms of amino acid substitution, very divergent but yet closely related family members. Our tool produces an interactive scalar vector graphics image that provides orthological relationship and domain content of proteins of interest at one glance. In addition, PDF, JPEG or PNG formatted output is also provided. These features make TreeDomViewer a valuable addition to the annotation pipeline of unknown genes or gene products. TreeDomViewer is available at http://www.bioinformatics.nl/tools/treedom/.


Asunto(s)
Gráficos por Computador , Filogenia , Estructura Terciaria de Proteína , Proteínas/clasificación , Programas Informáticos , Internet , Proteínas/genética , Alineación de Secuencia , Análisis de Secuencia de Proteína , Diseño de Software
11.
BMC Bioinformatics ; 6: 51, 2005 Mar 11.
Artículo en Inglés | MEDLINE | ID: mdl-15760478

RESUMEN

BACKGROUND: High throughput microarray analyses result in many differentially expressed genes that are potentially responsible for the biological process of interest. In order to identify biological similarities between genes, publications from MEDLINE were identified in which pairs of gene names and combinations of gene name with specific keywords were co-mentioned. RESULTS: MEDLINE search strings for 15,621 known genes and 3,731 keywords were generated and validated. PubMed IDs were retrieved from MEDLINE and relative probability of co-occurrences of all gene-gene and gene-keyword pairs determined. To assess gene clustering according to literature co-publication, 150 genes consisting of 8 sets with known connections (same pathway, same protein complex, or same cellular localization, etc.) were run through the program. Receiver operator characteristics (ROC) analyses showed that most gene sets were clustered much better than expected by random chance. To test grouping of genes from real microarray data, 221 differentially expressed genes from a microarray experiment were analyzed with CoPub Mapper, which resulted in several relevant clusters of genes with biological process and disease keywords. In addition, all genes versus keywords were hierarchical clustered to reveal a complete grouping of published genes based on co-occurrence. CONCLUSION: The CoPub Mapper program allows for quick and versatile querying of co-published genes and keywords and can be successfully used to cluster predefined groups of genes and microarray data.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Bibliográficas , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Mapeo Cromosómico , Análisis por Conglomerados , Gráficos por Computador , Bases de Datos Factuales , Bases de Datos Genéticas , Etiquetas de Secuencia Expresada , Reacciones Falso Positivas , Perfilación de la Expresión Génica , Genes , Humanos , Almacenamiento y Recuperación de la Información , MEDLINE , Metaanálisis como Asunto , Modelos Moleculares , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas , PubMed , Curva ROC , Alineación de Secuencia , Análisis de Secuencia de ADN , Programas Informáticos , Descriptores , Interfaz Usuario-Computador , Vocabulario Controlado
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...