Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 101
Filtrar
1.
Nucleic Acids Res ; 42(Database issue): D600-6, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24165880

RESUMO

Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive.


Assuntos
Bases de Dados Genéticas , Metagenômica , Perfilação da Expressão Gênica , Internet , Metabolômica , Proteômica , Software
2.
Proc Natl Acad Sci U S A ; 110(12): 4651-5, 2013 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-23487761

RESUMO

Do bacterial taxa demonstrate clear endemism, like macroorganisms, or can one site's bacterial community recapture the total phylogenetic diversity of the world's oceans? Here we compare a deep bacterial community characterization from one site in the English Channel (L4-DeepSeq) with 356 datasets from the International Census of Marine Microbes (ICoMM) taken from around the globe (ranging from marine pelagic and sediment samples to sponge-associated environments). At the L4-DeepSeq site, increasing sequencing depth uncovers greater phylogenetic overlap with the global ICoMM data. This site contained 31.7-66.2% of operational taxonomic units identified in a given ICoMM biome. Extrapolation of this overlap suggests that 1.93 × 10(11) sequences from the L4 site would capture all ICoMM bacterial phylogenetic diversity. Current technology trends suggest this limit may be attainable within 3 y. These results strongly suggest the marine biosphere maintains a previously undetected, persistent microbial seed bank.


Assuntos
Bactérias , Biodiversidade , Metagenoma , Oceanos e Mares , Filogenia , Microbiologia da Água
3.
Environ Microbiol ; 17(6): 1884-96, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25404571

RESUMO

Earthworms are globally distributed and perform essential roles for soil health and microbial structure. We have investigated the effect of an anthropogenic contamination gradient on the bacterial community of the keystone ecological species Lumbricus rubellus through utilizing 16S rRNA pyrosequencing for the first time to establish the microbiome of the host and surrounding soil. The earthworm-associated microbiome differs from the surrounding environment which appears to be a result of both filtering and stimulation likely linked to the altered environment associated with the gut micro-habitat (neutral pH, anoxia and increased carbon substrates). We identified a core earthworm community comprising Proteobacteria (∼50%) and Actinobacteria (∼30%), with lower abundances of Bacteroidetes (∼6%) and Acidobacteria (∼3%). In addition to the known earthworm symbiont (Verminephrobacter sp.), we identified a potential host-associated Gammaproteobacteria species (Serratia sp.) that was absent from soil yet observed in most earthworms. Although a distinct bacterial community defines these earthworms, clear family- and species-level modification were observed along an arsenic and iron contamination gradient. Several taxa observed in uncontaminated control microbiomes are suppressed by metal/metalloid field exposure, including eradication of the hereto ubiquitously associated Verminephrobacter symbiont, which raises implications to its functional role in the earthworm microbiome.


Assuntos
Arsênio/farmacologia , Microbiota/genética , Oligoquetos/efeitos dos fármacos , Oligoquetos/microbiologia , Poluentes do Solo/farmacologia , Acidobacteria/genética , Acidobacteria/isolamento & purificação , Actinobacteria/genética , Actinobacteria/isolamento & purificação , Animais , Bacteroidetes/genética , Bacteroidetes/isolamento & purificação , Comamonadaceae/genética , Comamonadaceae/isolamento & purificação , Ecossistema , Gammaproteobacteria/genética , Gammaproteobacteria/isolamento & purificação , RNA Ribossômico 16S/genética , Solo/química , Poluentes do Solo/análise
4.
Nat Methods ; 9(6): 621-5, 2012 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-22504588

RESUMO

Understanding the interactions between the Earth's microbiome and the physical, chemical and biological environment is a fundamental goal of microbial ecology. We describe a bioclimatic modeling approach that leverages artificial neural networks to predict microbial community structure as a function of environmental parameters and microbial interactions. This method was better at predicting observed community structure than were any of several single-species models that do not incorporate biotic interactions. The model was used to interpolate and extrapolate community structure over time with an average Bray-Curtis similarity of 89.7. Additionally, community structure was extrapolated geographically to create the first microbial map derived from single-point observations. This method can be generalized to the many microbial ecosystems for which detailed taxonomic data are currently being generated, providing an observation-based modeling technique for predicting microbial taxonomic structure in ecological studies.


Assuntos
Bactérias/genética , Ecossistema , Interações Microbianas , Actinomycetales/fisiologia , Deltaproteobacteria/fisiologia , Ecologia , Gammaproteobacteria/fisiologia , Metagenoma , Modelos Biológicos , Redes Neurais de Computação , Água do Mar/microbiologia
5.
PLoS Biol ; 9(6): e1001088, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21713030

RESUMO

A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.


Assuntos
Bases de Dados Genéticas , Genômica/normas , Cooperação Internacional , Metagenoma
7.
BMC Ecol ; 13: 16, 2013 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-23587026

RESUMO

Biodiversity informatics plays a central enabling role in the research community's efforts to address scientific conservation and sustainability issues. Great strides have been made in the past decade establishing a framework for sharing data, where taxonomy and systematics has been perceived as the most prominent discipline involved. To some extent this is inevitable, given the use of species names as the pivot around which information is organised. To address the urgent questions around conservation, land-use, environmental change, sustainability, food security and ecosystem services that are facing Governments worldwide, we need to understand how the ecosystem works. So, we need a systems approach to understanding biodiversity that moves significantly beyond taxonomy and species observations. Such an approach needs to look at the whole system to address species interactions, both with their environment and with other species.It is clear that some barriers to progress are sociological, basically persuading people to use the technological solutions that are already available. This is best addressed by developing more effective systems that deliver immediate benefit to the user, hiding the majority of the technology behind simple user interfaces. An infrastructure should be a space in which activities take place and, as such, should be effectively invisible.This community consultation paper positions the role of biodiversity informatics, for the next decade, presenting the actions needed to link the various biodiversity infrastructures invisibly and to facilitate understanding that can support both business and policy-makers. The community considers the goal in biodiversity informatics to be full integration of the biodiversity research community, including citizens' science, through a commonly-shared, sustainable e-infrastructure across all sub-disciplines that reliably serves science and society alike.


Assuntos
Biodiversidade , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Animais , Ecossistema , Humanos , Disseminação de Informação
8.
Nucleic Acids Res ; 39(Database issue): D7-10, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21097465

RESUMO

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.


Assuntos
Bases de Dados Factuais/normas , Disseminação de Informação
9.
BMC Bioinformatics ; 13: 42, 2012 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-22429538

RESUMO

BACKGROUND: A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. RESULTS: Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. CONCLUSIONS: Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.


Assuntos
Metodologias Computacionais , Genômica/métodos , Animais , Computadores , Humanos , Alinhamento de Sequência , Software
10.
BMC Bioinformatics ; 13: 141, 2012 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-22720753

RESUMO

BACKGROUND: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference. DESCRIPTION: We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation namespaces, e.g. KEGG or NCBI's GenBank. CONCLUSIONS: The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.


Assuntos
Bases de Dados de Proteínas , Software , Biologia Computacional , Bases de Dados de Ácidos Nucleicos , Metagenômica , Proteínas/química , Proteínas/genética
11.
Bioinformatics ; 26(18): 2354-6, 2010 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-20679334

RESUMO

UNLABELLED: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories. AVAILABILITY AND IMPLEMENTATION: Software, documentation, case studies and implementations at http://www.isa-tools.org.


Assuntos
Software , Lista de Checagem , Documentação
12.
Nucleic Acids Res ; 36(Database issue): D970-6, 2008 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18073194

RESUMO

The PlantTribes database (http://fgp.huck.psu.edu/tribe.html) is a plant gene family database based on the inferred proteomes of five sequenced plant species: Arabidopsis thaliana, Carica papaya, Medicago truncatula, Oryza sativa and Populus trichocarpa. We used the graph-based clustering algorithm MCL [Van Dongen (Technical Report INS-R0010 2000) and Enright et al. (Nucleic Acids Res. 2002; 30: 1575-1584)] to classify all of these species' protein-coding genes into putative gene families, called tribes, using three clustering stringencies (low, medium and high). For all tribes, we have generated protein and DNA alignments and maximum-likelihood phylogenetic trees. A parallel database of microarray experimental results is linked to the genes, which lets researchers identify groups of related genes and their expression patterns. Unified nomenclatures were developed, and tribes can be related to traditional gene families and conserved domain identifiers. SuperTribes, constructed through a second iteration of MCL clustering, connect distant, but potentially related gene clusters. The global classification of nearly 200 000 plant proteins was used as a scaffold for sorting approximately 4 million additional cDNA sequences from over 200 plant species. All data and analyses are accessible through a flexible interface allowing users to explore the classification, to place query sequences within the classification, and to download results for further study.


Assuntos
Bases de Dados Genéticas , Genes de Plantas , Genoma de Planta , Filogenia , Plantas/classificação , Proteínas/classificação , Internet , Proteínas/genética , Alinhamento de Sequência , Interface Usuário-Computador
13.
Environ Microbiol ; 11(12): 3132-9, 2009 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19659500

RESUMO

Very few marine microbial communities are well characterized even with the weight of research effort presently devoted to it. Only a small proportion of this effort has been aimed at investigating temporal community structure. Here we present the first report of the application of high-throughput pyrosequencing to investigate intra-annual bacterial community structure. Microbial diversity was determined for 12 time points at the surface of the L4 sampling site in the Western English Channel. This was performed over 11 months during 2007. A total of 182 560 sequences from the V6 hyper-variable region of the small-subunit ribosomal RNA gene (16S rRNA) were obtained; there were between 11 327 and 17 339 reads per sample. Approximately 7000 genera were identified, with one in every 25 reads being attributed to a new genus; yet this level of sampling far from exhausted the total diversity present at any one time point. The total data set contained 17 673 unique sequences. Only 93 (0.5%) were found at all time points, yet these few lineages comprised 50% of the total reads sequenced. The most abundant phylum was Proteobacteria (50% of all sequenced reads), while the SAR11 clade comprised 21% of the ubiquitous reads and approximately 12% of the total sequenced reads. In contrast, 78% of all operational taxonomic units were only found at one time point and 67% were only found once, evidence of a large and transient rare assemblage. This time series shows evidence of seasonally structured community diversity. There is also evidence for seasonal succession, primarily reflecting changes among dominant taxa. These changes in structure were significantly correlated to a combination of temperature, phosphate and silicate concentrations.


Assuntos
Bactérias/classificação , Biodiversidade , Monitoramento Ambiental/métodos , Água do Mar/microbiologia , Oceano Atlântico , Bactérias/genética , Filogenia , Proteobactérias/classificação , Proteobactérias/genética , RNA Ribossômico 16S/genética , Estações do Ano , Água do Mar/química , Análise de Sequência de DNA
14.
Environ Microbiol ; 11(1): 111-25, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-18783384

RESUMO

Phosphonates are organic compounds that contain a C-P bond and are a poorly characterized component of the marine phosphorus cycle. They may represent a potential source of bioavailable phosphorus, particularly in oligotrophic conditions. This study has investigated the distribution of the phnA gene which encodes phosphonoacetate hydrolase, the enzyme that mineralizes phosphonoacetate. Using newly designed degenerate primers targeting the phnA gene we analysed the potential for phosphonoacetate utilization in DNA and cDNA libraries constructed from a phytoplankton bloom in the Western English Channel during July 2006. Total RNA was isolated and reverse transcribed and phosphonoacetate hydrolase (phnA) transcripts were PCR amplified from the cDNA with the degenerate primers, cloned and sequenced. Phylogenetic analysis demonstrated considerable diversity with 14 sequence types yielding five unique phnA protein groups. We also identified 28 phnA homologues in a 454-pyrosequencing metagenomic and metatranscriptomic study from a coastal marine mesocosm, indicating that > 3% of marine bacteria in this study contained phnA. phnA homologues were also present in a metagenomic fosmid library from this experiment. Finally, cultures of four isolates of potential coral pathogens belonging to the Vibrionaceae contained the phnA gene. In the laboratory, these isolates were able to grow with phosphonoacetate as sole P and C source. The fact that the capacity to utilize phosphonoacetate was evident in each of the three coastal environments suggests the potential for widespread utilization of this bioavailable P source.


Assuntos
Bactérias/classificação , Bactérias/metabolismo , Ácido Fosfonoacéticos/metabolismo , Água do Mar/microbiologia , Fosfatase Alcalina , Bactérias/genética , Bactérias/isolamento & purificação , Proteínas de Bactérias/genética , DNA Bacteriano/química , DNA Bacteriano/genética , DNA Ribossômico/química , DNA Ribossômico/genética , Dados de Sequência Molecular , Monoéster Fosfórico Hidrolases/genética , Filogenia , Análise de Sequência de DNA , Homologia de Sequência de Aminoácidos
16.
Nat Biotechnol ; 24(7): 801-3, 2006 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-16841067

RESUMO

Developing and deploying specialized computing systems for specific research communities is achievable, cost effective and has wide-ranging benefits.


Assuntos
Biologia Computacional/métodos , Armazenamento e Recuperação da Informação , Software/provisão & distribuição , Biologia Computacional/tendências , Metodologias Computacionais , Armazenamento e Recuperação da Informação/normas , Software/normas
17.
OMICS ; 12(2): 115-21, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18479204

RESUMO

The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the "Minimum Information about a Genome Sequence" (MIGS) specification and its extension, the "Minimum Information about a Metagenome Sequence" (MIMS). GCDML is an XML Schema for generating MIGS/MIMS compliant reports for data entry, exchange, and storage. When mature, this sample-centric, strongly-typed schema will provide a diverse set of descriptors for describing the exact origin and processing of a biological sample, from sampling to sequencing, and subsequent analysis. Here we describe the need for such a project, outline design principles required to support the project, and make an open call for participation in defining the future content of GCDML. GCDML is freely available, and can be downloaded, along with documentation, from the GSC Web site (http://gensc.org).


Assuntos
Bases de Dados Genéticas , Genômica , Linguagens de Programação
18.
OMICS ; 12(2): 157-60, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18564916

RESUMO

Increasingly, we are aware as a community of the growing need to manage the avalanche of genomic and metagenomic data, in addition to related data types like ribosomal RNA and barcode sequences, in a way that tightly integrates contextual data with traditional literature in a machine-readable way. It is for this reason that the Genomic Standards Consortium (GSC) formed in 2005. Here we suggest that we move beyond the development of standards and tackle standards compliance and improved data capture at the level of the scientific publication. We are supported in this goal by the fact that the scientific community is in the midst of a publishing revolution. This revolution is marked by a growing shift away from a traditional dichotomy between "journal articles" and "database entries" and an increasing adoption of hybrid models of collecting and disseminating scientific information. With respect to genomes and metagenomes and related data types, we feel the scientific community would be best served by the immediate launch of a central repository of short, highly structured "Genome Notes" that must be standards compliant. This could be done in the context of an existing journal, but we also suggest the more radical solution of launching a new journal. Such a journal could be designed to cater to a wide range of standards-related content types that are not currently centralized in the published literature. It could also support the demand for centralizing aspects of the "gray literature" (documents developed by institutions or communities) such as the call by the GSC for a central repository of Standard Operating Procedures describing the genomic annotation pipelines of the major sequencing centers. We argue that such an "eJournal," published under the Open Access paradigm by the GSC, could be an attractive publishing forum for a broader range of standardization initiatives within, and beyond, the GSC and thereby fill an unoccupied yet increasingly important niche within the current research landscape.


Assuntos
Genômica/normas , Fidelidade a Diretrizes , Publicações
19.
OMICS ; 12(2): 123-7, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18479205

RESUMO

Given the growing wealth of downstream information, the integration of molecular and non-molecular data on a given organism has become a major challenge. For micro-organisms, this information now includes a growing collection of sequenced genes and complete genomes, and for communities of organisms it includes metagenomes. Integration of the data is facilitated by the existence of authoritative, community-recognized, consensus identifiers that may form the heart of so-called information knuckles. The Genomic Standards Consortium (GSC) is building a mapping of identifiers across a group of federated databases with the aim to improve navigation across these resources and to enable the integration of their information in the near future. In particular, this is possible because of the existence of INSDC Genome Project Identifiers (GPIDs) and accession numbers, and the ability of the community to define new consensus identifiers such as the culture identifiers used in the StrainInfo.net bioportal. Here we outline (1) the general design of the Genomic Rosetta Stone project, (2) introduce example linkages between key databases (that cover information about genomes, 16S rRNA gene sequences, and microbial biological resource centers), and (3) make an open call for participation in this project providing a vision for its future use.


Assuntos
Bases de Dados Genéticas , Genômica , Biologia Computacional
20.
OMICS ; 12(2): 129-36, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18416669

RESUMO

There is an urgent need to capture metadata on the rapidly growing number of genomic, metagenomic and related sequences, such as 16S ribosomal genes. This need is a major focus within the Genomic Standards Consortium (GSC), and Habitat is a key metadata descriptor in the proposed "Minimum Information about a Genome Sequence" (MIGS) specification. The goal of the work described here is to provide a light-weight, easy-to-use (small) set of terms ("Habitat-Lite") that captures high-level information about habitat while preserving a mapping to the recently launched Environment Ontology (EnvO). Our motivation for building Habitat-Lite is to meet the needs of multiple users, such as annotators curating these data, database providers hosting the data, and biologists and bioinformaticians alike who need to search and employ such data in comparative analyses. Here, we report a case study based on semiautomated identification of terms from GenBank and GOLD. We estimate that the terms in the initial version of Habitat-Lite would provide useful labels for over 60% of the kinds of information found in the GenBank isolation_source field, and around 85% of the terms in the GOLD habitat field. We present a revised version of Habitat-Lite defined within the EnvO Environmental Ontology through a new category, EnvO-Lite-GSC. We invite the community's feedback on its further development to provide a minimum list of terms to capture high-level habitat information and to provide classification bins needed for future studies.


Assuntos
Genômica , Bases de Dados Genéticas , Padrões de Referência
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa