Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
Nucleic Acids Res ; 45(D1): D555-D559, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27924032

RESUMEN

Secondary metabolites produced by microorganisms are the main source of bioactive compounds that are in use as antimicrobial and anticancer drugs, fungicides, herbicides and pesticides. In the last decade, the increasing availability of microbial genomes has established genome mining as a very important method for the identification of their biosynthetic gene clusters (BGCs). One of the most popular tools for this task is antiSMASH. However, so far, antiSMASH is limited to de novo computing results for user-submitted genomes and only partially connects these with BGCs from other organisms. Therefore, we developed the antiSMASH database, a simple but highly useful new resource to browse antiSMASH-annotated BGCs in the currently 3907 bacterial genomes in the database and perform advanced search queries combining multiple search criteria. antiSMASH-DB is available at http://antismash-db.secondarymetabolites.org/.


Asunto(s)
Vías Biosintéticas , Bases de Datos Factuales , Microbiología , Metabolismo Secundario , Vías Biosintéticas/genética , Biología Computacional/métodos , Regulación de la Expresión Génica , Procesamiento Proteico-Postraduccional , Metabolismo Secundario/genética , Navegador Web
2.
BMC Ecol ; 16(1): 49, 2016 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-27765035

RESUMEN

BACKGROUND: Making forecasts about biodiversity and giving support to policy relies increasingly on large collections of data held electronically, and on substantial computational capability and capacity to analyse, model, simulate and predict using such data. However, the physically distributed nature of data resources and of expertise in advanced analytical tools creates many challenges for the modern scientist. Across the wider biological sciences, presenting such capabilities on the Internet (as "Web services") and using scientific workflow systems to compose them for particular tasks is a practical way to carry out robust "in silico" science. However, use of this approach in biodiversity science and ecology has thus far been quite limited. RESULTS: BioVeL is a virtual laboratory for data analysis and modelling in biodiversity science and ecology, freely accessible via the Internet. BioVeL includes functions for accessing and analysing data through curated Web services; for performing complex in silico analysis through exposure of R programs, workflows, and batch processing functions; for on-line collaboration through sharing of workflows and workflow runs; for experiment documentation through reproducibility and repeatability; and for computational support via seamless connections to supporting computing infrastructures. We developed and improved more than 60 Web services with significant potential in many different kinds of data analysis and modelling tasks. We composed reusable workflows using these Web services, also incorporating R programs. Deploying these tools into an easy-to-use and accessible 'virtual laboratory', free via the Internet, we applied the workflows in several diverse case studies. We opened the virtual laboratory for public use and through a programme of external engagement we actively encouraged scientists and third party application and tool developers to try out the services and contribute to the activity. CONCLUSIONS: Our work shows we can deliver an operational, scalable and flexible Internet-based virtual laboratory to meet new demands for data processing and analysis in biodiversity science and ecology. In particular, we have successfully integrated existing and popular tools and practices from different scientific disciplines to be used in biodiversity and ecological research.


Asunto(s)
Biodiversidad , Ecología/métodos , Ecología/instrumentación , Internet , Modelos Biológicos , Programas Informáticos , Flujo de Trabajo
3.
PLoS Biol ; 9(6): e1001088, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21713030

RESUMEN

A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.


Asunto(s)
Bases de Datos Genéticas , Genómica/normas , Cooperación Internacional , Metagenoma
4.
Nat Chem Biol ; 11(9): 625-31, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-26284661
5.
Nucleic Acids Res ; 38(Database issue): D391-5, 2010 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19858098

RESUMEN

Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Animales , Biología Computacional/tendencias , Ecología , Ambiente , Genoma Bacteriano , Geografía , Humanos , Almacenamiento y Recuperación de la Información/métodos , Internet , Océanos y Mares , Estructura Terciaria de Proteína , Programas Informáticos
6.
Elife ; 112022 03 31.
Artículo en Inglés | MEDLINE | ID: mdl-35356891

RESUMEN

Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40-60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.


It is estimated that scientists do not know what half of microbial genes actually do. When these genes are discovered in microorganisms grown in the lab or found in environmental samples, it is not possible to identify what their roles are. Many of these genes are excluded from further analyses for these reasons, meaning that the study of microbial genes tends to be limited to genes that have already been described. These limitations hinder research into microbiology, because information from newly discovered genes cannot be integrated to better understand how these organisms work. Experiments to understand what role these genes have in the microorganisms are labor-intensive, so new analytical strategies are needed. To do this, Vanni et al. developed a new framework to categorize genes with unknown roles, and a computational workflow to integrate them into traditional analyses. When this approach was applied to over 400 million microbial genes (both with known and unknown roles), it showed that the share of genes with unknown functions is only about 30 per cent, smaller than previously thought. The analysis also showed that these genes are very diverse, revealing a huge space for future research and potential applications. Combining their approach with experimental data, Vanni et al. were able to identify a gene with a previously unknown purpose that could be involved in antibiotic resistance. This system could be useful for other scientists studying microorganisms to get a more complete view of microbial systems. In future, it may also be used to analyze the genetics of other organisms, such as plants and animals.


Asunto(s)
Bacterias , Genoma Arqueal , Bacterias/genética , Metagenoma , Sistemas de Lectura Abierta
7.
BMC Bioinformatics ; 11: 358, 2010 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-20591175

RESUMEN

BACKGROUND: Environmental sequence datasets are increasing at an exponential rate; however, the vast majority of them lack appropriate descriptors like sampling location, time and depth/altitude: generally referred to as metadata or contextual data. The consistent capture and structured submission of these data is crucial for integrated data analysis and ecosystems modeling. The application MetaBar has been developed, to support consistent contextual data acquisition. RESULTS: MetaBar is a spreadsheet and web-based software tool designed to assist users in the consistent acquisition, electronic storage, and submission of contextual data associated to their samples. A preconfigured Microsoft Excel spreadsheet is used to initiate structured contextual data storage in the field or laboratory. Each sample is given a unique identifier and at any stage the sheets can be uploaded to the MetaBar database server. To label samples, identifiers can be printed as barcodes. An intuitive web interface provides quick access to the contextual data in the MetaBar database as well as user and project management capabilities. Export functions facilitate contextual and sequence data submission to the International Nucleotide Sequence Database Collaboration (INSDC), comprising of the DNA DataBase of Japan (DDBJ), the European Molecular Biology Laboratory database (EMBL) and GenBank. MetaBar requests and stores contextual data in compliance to the Genomic Standards Consortium specifications. The MetaBar open source code base for local installation is available under the GNU General Public License version 3 (GNU GPL3). CONCLUSION: The MetaBar software supports the typical workflow from data acquisition and field-sampling to contextual data enriched sequence submission to an INSDC database. The integration with the megx.net marine Ecological Genomics database and portal facilitates georeferenced data integration and metadata-based comparisons of sampling sites as well as interactive data visualization. The ample export functionalities and the INSDC submission support enable exchange of data across disciplines and safeguarding contextual data.


Asunto(s)
Bases de Datos Genéticas , Genómica , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Secuencia de Bases , Internet , Lenguajes de Programación , Interfaz Usuario-Computador , Flujo de Trabajo
8.
BMC Bioinformatics ; 9: 459, 2008 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-18957118

RESUMEN

BACKGROUND: Modern sequencing technologies allow rapid sequencing and bioinformatic analysis of genomes and metagenomes. With every new sequencing project a vast number of new proteins become available with many genes remaining functionally unclassified based on evidences from sequence similarities alone. Extending similarity searches with gene pattern approaches, defined as genes sharing a distinct genomic neighbourhood, have shown to significantly improve the number of functional assignments. Further functional evidences can be gained by correlating these gene patterns with prevailing environmental parameters. MetaMine was developed to approach the large pool of unclassified proteins by searching for recurrent gene patterns across habitats based on key genes. RESULTS: MetaMine is an interactive data mining tool which enables the detection of gene patterns in an environmental context. The gene pattern search starts with a user defined environmentally interesting key gene. With this gene a BLAST search is carried out against the Microbial Ecological Genomics DataBase (MEGDB) containing marine genomic and metagenomic sequences. This is followed by the determination of all neighbouring genes within a given distance and a search for functionally equivalent genes. In the final step a set of common genes present in a defined number of distinct genomes is determined. The gene patterns found are associated with their individual pattern instances describing gene order and directions. They are presented together with information about the sample and the habitat. MetaMine is implemented in Java and provided as a client/server application with a user-friendly graphical user interface. The system was evaluated with environmentally relevant genes related to the methane-cycle and carbon monoxide oxidation. CONCLUSION: MetaMine offers a targeted, semi-automatic search for gene patterns based on expert input. The graphical user interface of MetaMine provides a user-friendly overview of the computed gene patterns for further inspection in an ecological context. Prevailing biological processes associated with a key gene can be used to infer new annotations and shape hypotheses to guide further analyses. The use-cases demonstrate that meaningful gene patterns can be quickly detected using MetaMine.MetaMine is freely available for academic use from http://www.megx.net/metamine.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , Ecosistema , Familia de Multigenes/genética , Programas Informáticos , Reconocimiento de Normas Patrones Automatizadas
9.
BMC Bioinformatics ; 9: 177, 2008 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-18380896

RESUMEN

BACKGROUND: Current sequencing technologies give access to sequence information for genomes and metagenomes at a tremendous speed. Subsequent data processing is mainly performed by automatic pipelines provided by the sequencing centers. Although, standardised workflows are desirable and useful in many respects, rational data mining, comparative genomics, and especially the interpretation of the sequence information in the biological context, demands for intuitive, flexible, and extendable solutions. RESULTS: The JCoast software tool was primarily designed to analyse and compare (meta)genome sequences of prokaryotes. Based on a pre-computed GenDB database project, JCoast offers a flexible graphical user interface (GUI), as well as an application programming interface (API) that facilitates back-end data access. JCoast offers individual, cross genome-, and metagenome analysis, and assists the biologist in exploration of large and complex datasets. CONCLUSION: JCoast combines all functions required for the mining, annotation, and interpretation of (meta)genomic data. The lightweight software solution allows the user to easily take advantage of advanced back-end database structures by providing a programming and graphical user interface to answer biological questions. JCoast is available at the project homepage.


Asunto(s)
Algoritmos , Mapeo Cromosómico/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Células Procariotas/fisiología , Programas Informáticos , Interfaz Usuario-Computador , Lenguajes de Programación
10.
OMICS ; 12(2): 115-21, 2008 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-18479204

RESUMEN

The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the "Minimum Information about a Genome Sequence" (MIGS) specification and its extension, the "Minimum Information about a Metagenome Sequence" (MIMS). GCDML is an XML Schema for generating MIGS/MIMS compliant reports for data entry, exchange, and storage. When mature, this sample-centric, strongly-typed schema will provide a diverse set of descriptors for describing the exact origin and processing of a biological sample, from sampling to sequencing, and subsequent analysis. Here we describe the need for such a project, outline design principles required to support the project, and make an open call for participation in defining the future content of GCDML. GCDML is freely available, and can be downloaded, along with documentation, from the GSC Web site (http://gensc.org).


Asunto(s)
Bases de Datos Genéticas , Genómica , Lenguajes de Programación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA