Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Syst Biol ; 69(6): 1231-1253, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32298457

RESUMO

Natural history collections are leading successful large-scale projects of specimen digitization (images, metadata, DNA barcodes), thereby transforming taxonomy into a big data science. Yet, little effort has been directed towards safeguarding and subsequently mobilizing the considerable amount of original data generated during the process of naming 15,000-20,000 species every year. From the perspective of alpha-taxonomists, we provide a review of the properties and diversity of taxonomic data, assess their volume and use, and establish criteria for optimizing data repositories. We surveyed 4113 alpha-taxonomic studies in representative journals for 2002, 2010, and 2018, and found an increasing yet comparatively limited use of molecular data in species diagnosis and description. In 2018, of the 2661 papers published in specialized taxonomic journals, molecular data were widely used in mycology (94%), regularly in vertebrates (53%), but rarely in botany (15%) and entomology (10%). Images play an important role in taxonomic research on all taxa, with photographs used in >80% and drawings in 58% of the surveyed papers. The use of omics (high-throughput) approaches or 3D documentation is still rare. Improved archiving strategies for metabarcoding consensus reads, genome and transcriptome assemblies, and chemical and metabolomic data could help to mobilize the wealth of high-throughput data for alpha-taxonomy. Because long-term-ideally perpetual-data storage is of particular importance for taxonomy, energy footprint reduction via less storage-demanding formats is a priority if their information content suffices for the purpose of taxonomic studies. Whereas taxonomic assignments are quasifacts for most biological disciplines, they remain hypotheses pertaining to evolutionary relatedness of individuals for alpha-taxonomy. For this reason, an improved reuse of taxonomic data, including machine-learning-based species identification and delimitation pipelines, requires a cyberspecimen approach-linking data via unique specimen identifiers, and thereby making them findable, accessible, interoperable, and reusable for taxonomic research. This poses both qualitative challenges to adapt the existing infrastructure of data centers to a specimen-centered concept and quantitative challenges to host and connect an estimated $ \le $2 million images produced per year by alpha-taxonomic studies, plus many millions of images from digitization campaigns. Of the 30,000-40,000 taxonomists globally, many are thought to be nonprofessionals, and capturing the data for online storage and reuse therefore requires low-complexity submission workflows and cost-free repository use. Expert taxonomists are the main stakeholders able to identify and formalize the needs of the discipline; their expertise is needed to implement the envisioned virtual collections of cyberspecimens. [Big data; cyberspecimen; new species; omics; repositories; specimen identifier; taxonomy; taxonomic data.].


Assuntos
Classificação , Bases de Dados Factuais/normas , Animais , Bases de Dados Factuais/tendências
2.
Database (Oxford) ; 20202020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-32283554

RESUMO

The Nagoya Protocol on Access and Benefit Sharing is a transparent legal framework, which governs the access to genetic resources and the fair and equitable sharing of benefits arising from their utilization. Complying with the Nagoya regulations ensures legal use and re-use of data from genetic resources. Providing detailed provenance information and clear re-usage conditions plays a key role in ensuring the re-usability of research data according to the FAIR (findable, accessible, interoperable and re-usable) Guiding Principles for scientific data management and stewardship. Even with the framework provided by the ABS (access and benefit sharing) Clearing House and the support of the National Focal Points, establishing a direct link between the research data from genetic resources and the relevant Nagoya information remains a challenge. This is particularly true for re-using publicly available data. The Nagoya Lookup Service was developed for stakeholders in biological sciences with the aim at facilitating the legal and FAIR data management, specifically for data publication and re-use. The service provides up-to-date information on the Nagoya party status for a geolocation provided by GPS coordinates, directing the user to the relevant local authorities for further information. It integrates open data from the ABS Clearing House, Marine Regions, GeoNames and Wikidata. The service is accessible through a REST API and a user-friendly web form. Stakeholders include data librarians, data brokers, scientists and data archivists who may use this service before, during and after data acquisition or publication to check whether legal documents need to be prepared, considered or verified. The service allows researchers to estimate whether genetic data they plan to produce or re-use might fall under Nagoya regulations or not, within the limits of the technology and without constituting legal advice. It is implemented using portable Docker containers and can easily be deployed locally or on a cloud infrastructure. The source code for building the service is available under an open-source license on GitHub, with a functional image on Docker Hub and can be used by anyone free of charge.


Assuntos
Biotecnologia/métodos , Curadoria de Dados/métodos , Mineração de Dados/métodos , Bases de Dados Genéticas , Biotecnologia/legislação & jurisprudência , Mineração de Dados/legislação & jurisprudência , Troca de Informação em Saúde/legislação & jurisprudência , Humanos , Cooperação Internacional , Alocação de Recursos/legislação & jurisprudência , Alocação de Recursos/métodos
3.
Gigascience ; 4: 27, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26097697

RESUMO

Ocean Sampling Day was initiated by the EU-funded Micro B3 (Marine Microbial Biodiversity, Bioinformatics, Biotechnology) project to obtain a snapshot of the marine microbial biodiversity and function of the world's oceans. It is a simultaneous global mega-sequencing campaign aiming to generate the largest standardized microbial data set in a single day. This will be achievable only through the coordinated efforts of an Ocean Sampling Day Consortium, supportive partnerships and networks between sites. This commentary outlines the establishment, function and aims of the Consortium and describes our vision for a sustainable study of marine microbial communities and their embedded functional traits.


Assuntos
Biologia Marinha , Biodiversidade , Sistemas de Gerenciamento de Base de Dados , Metagenômica , Oceanos e Mares
4.
Mar Genomics ; 19: 45-6, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25479944

RESUMO

Marine metatranscriptome data was generated as part of a study investigating the bacterioplankton communities towards the end of a diatom-dominated spring phytoplankton bloom. This genomic resource article reports a metatranscriptomic dataset from amidst the winter time prior to the occurrence of the spring diatom bloom. Up to 58% of all sequences could be assigned to predicted genes. Taxonomic analysis based on expressed 16S ribosomal RNA genes identified Alphaproteobacteria and Gammaproteobacteria as the most active community members.


Assuntos
Bactérias/genética , Eutrofização , Plâncton/genética , Estações do Ano , Transcriptoma , Bactérias/metabolismo , Sequência de Bases , Dados de Sequência Molecular , Mar do Norte , Plâncton/metabolismo , RNA Ribossômico 16S/genética , Análise de Sequência de RNA
5.
PLoS One ; 8(3): e50869, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23516388

RESUMO

BACKGROUND: The proportion of conserved DNA sequences with no clear function is steadily growing in bioinformatics databases. Studies of sequence and structural homology have indicated that many uncharacterized protein domain sequences are variants of functionally described domains. If these variants promote an organism's ecological fitness, they are likely to be conserved in the genome of its progeny and the population at large. The genetic composition of microbial communities in their native ecosystems is accessible through metagenomics. We hypothesize the co-variation of protein domain sequences across metagenomes from similar ecosystems will provide insights into their potential roles and aid further investigation. METHODOLOGY/PRINCIPAL FINDINGS: We calculated the correlation of Pfam protein domain sequences across the Global Ocean Sampling metagenome collection, employing conservative detection and correlation thresholds to limit results to well-supported hits and associations. We then examined intercorrelations between domains of unknown function (DUFs) and domains involved in known metabolic pathways using network visualization and cluster-detection tools. We used a cautious "guilty-by-association" approach, referencing knowledge-level resources to identify and discuss associations that offer insight into DUF function. We observed numerous DUFs associated to photobiologically active domains and prevalent in the Cyanobacteria. Other clusters included DUFs associated with DNA maintenance and repair, inorganic nutrient metabolism, and sodium-translocating transport domains. We also observed a number of clusters reflecting known metabolic associations and cases that predicted functional reclassification of DUFs. CONCLUSION/SIGNIFICANCE: Critically examining domain covariation across metagenomic datasets can grant new perspectives on the roles and associations of DUFs in an ecological setting. Targeted attempts at DUF characterization in the laboratory or in silico may draw from these insights and opportunities to discover new associations and corroborate existing ones will arise as more large-scale metagenomic datasets emerge.


Assuntos
Ecossistema , Metagenoma , Metagenômica , Domínios e Motivos de Interação entre Proteínas/fisiologia , Água do Mar/microbiologia , Análise por Conglomerados , Biologia Computacional/métodos , Cianobactérias/classificação , Cianobactérias/genética , Cianobactérias/metabolismo , Ferro/metabolismo , Fotossíntese/fisiologia
6.
PLoS One ; 6(9): e24797, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21935468

RESUMO

State of the art (DNA) sequencing methods applied in "Omics" studies grant insight into the 'blueprints' of organisms from all domains of life. Sequencing is carried out around the globe and the data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration. However, the context in which these studies are conducted often gets lost, because experimental data, as well as information about the environment are rarely submitted along with the sequence data. If these contextual or metadata are missing, key opportunities of comparison and analysis across studies and habitats are hampered or even impossible. To address this problem, the Genomic Standards Consortium (GSC) promotes checklists and standards to better describe our sequence data collection and to promote the capturing, exchange and integration of sequence data with contextual data. In a recent community effort the GSC has developed a series of recommendations for contextual data that should be submitted along with sequence data. To support the scientific community to significantly enhance the quality and quantity of contextual data in the public sequence data repositories, specialized software tools are needed. In this work we present CDinFusion, a web-based tool to integrate contextual and sequence data in (Multi)FASTA format prior to submission. The tool is open source and available under the Lesser GNU Public License 3. A public installation is hosted and maintained at the Max Planck Institute for Marine Microbiology at http://www.megx.net/cdinfusion. The tool may also be installed locally using the open source code available at http://code.google.com/p/cdinfusion.


Assuntos
Biologia Computacional/métodos , Software , Bases de Dados Genéticas , Genômica
7.
Microb Inform Exp ; 1(1): 9, 2011 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-22587903

RESUMO

BACKGROUND: DNA-binding transcription factors (TFs) regulate cellular functions in prokaryotes, often in response to environmental stimuli. Thus, the environment exerts constant selective pressure on the TF gene content of microbial communities. Recently a study on marine Synechococcus strains detected differences in their genomic TF content related to environmental adaptation, but so far the effect of environmental parameters on the content of TFs in bacterial communities has not been systematically investigated. RESULTS: We quantified the effect of environment stability on the transcription factor repertoire of marine pelagic microbes from the Global Ocean Sampling (GOS) metagenome using interpolated physico-chemical parameters and multivariate statistics. Thirty-five percent of the difference in relative TF abundances between samples could be explained by environment stability. Six percent was attributable to spatial distance but none to a combination of both spatial distance and stability. Some individual TFs showed a stronger relationship to environment stability and space than the total TF pool. CONCLUSIONS: Environmental stability appears to have a clearly detectable effect on TF gene content in bacterioplanktonic communities described by the GOS metagenome. Interpolated environmental parameters were shown to compare well to in situ measurements and were essential for quantifying the effect of the environment on the TF content. It is demonstrated that comprehensive and well-structured contextual data will strongly enhance our ability to interpret the functional potential of microbes from metagenomic data.

8.
Environ Microbiol ; 12(2): 422-39, 2010 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-19878267

RESUMO

Microbial consortia mediating the anaerobic oxidation of methane with sulfate are composed of methanotrophic Archaea (ANME) and Bacteria related to sulfate-reducing Deltaproteobacteria. Cultured representatives are not available for any of the three ANME clades. Therefore, a metagenomic approach was applied to assess the genetic potential of ANME-1 archaea. In total, 3.4 Mbp sequence information was generated based on metagenomic fosmid libraries constructed directly from a methanotrophic microbial mat in the Black Sea. These sequence data represent, in 30 contigs, about 82-90% of a composite ANME-1 genome. The dataset supports the hypothesis of a reversal of the methanogenesis pathway. Indications for an assimilatory, but not for a dissimilatory sulfate reduction pathway in ANME-1, were found. Draft genome and expression analyses are consistent with acetate and formate as putative electron shuttles. Moreover, the dataset points towards downstream electron-accepting redox components different from the ones known from methanogenic archaea. Whereas catalytic subunits of [NiFe]-hydrogenases are lacking in the dataset, genes for an [FeFe]-hydrogenase homologue were identified, not yet described to be present in methanogenic archaea. Clustered genes annotated as secreted multiheme c-type cytochromes were identified, which have not yet been correlated with methanogenesis-related steps. The genes were shown to be expressed, suggesting direct electron transfer as an additional possible mode to shuttle electrons from ANME-1 to the bacterial sulfate-reducing partner.


Assuntos
Euryarchaeota/genética , Euryarchaeota/metabolismo , Metagenoma , RNA Mensageiro/metabolismo , Sequência de Bases , Citocromos c/genética , Euryarchaeota/classificação , Hidrogenase/genética , Proteínas Ferro-Enxofre/genética , Metagenômica , Metano/metabolismo , Dados de Sequência Molecular , Oceanos e Mares , Oxirredução
9.
BMC Bioinformatics ; 9: 177, 2008 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-18380896

RESUMO

BACKGROUND: Current sequencing technologies give access to sequence information for genomes and metagenomes at a tremendous speed. Subsequent data processing is mainly performed by automatic pipelines provided by the sequencing centers. Although, standardised workflows are desirable and useful in many respects, rational data mining, comparative genomics, and especially the interpretation of the sequence information in the biological context, demands for intuitive, flexible, and extendable solutions. RESULTS: The JCoast software tool was primarily designed to analyse and compare (meta)genome sequences of prokaryotes. Based on a pre-computed GenDB database project, JCoast offers a flexible graphical user interface (GUI), as well as an application programming interface (API) that facilitates back-end data access. JCoast offers individual, cross genome-, and metagenome analysis, and assists the biologist in exploration of large and complex datasets. CONCLUSION: JCoast combines all functions required for the mining, annotation, and interpretation of (meta)genomic data. The lightweight software solution allows the user to easily take advantage of advanced back-end database structures by providing a programming and graphical user interface to answer biological questions. JCoast is available at the project homepage.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Células Procarióticas/fisiologia , Software , Interface Usuário-Computador , Linguagens de Programação
10.
ISME J ; 1(5): 419-35, 2007 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18043661

RESUMO

Planctomycetes are widely distributed in marine environments, where they supposedly play a role in carbon recycling. To deepen our understanding about the ecology of this sparsely studied phylum six planctomycete fosmids from two marine upwelling systems were investigated and compared with all available planctomycete genomic sequences including the as yet unpublished near-complete genomes of Blastopirellula marina DSM 3645(T) and Planctomyces maris DSM 8797(T). High numbers of sulfatase genes (41-109) were found on all marine planctomycete genomes and on two fosmids (2). Furthermore, C1 metabolism genes otherwise only known from methanogenic Archaea and methylotrophic Proteobacteria were found on two fosmids and all planctomycete genomes, except for 'Candidatus Kuenenia stuttgartiensis'. Codon usage analysis indicated high expression levels for some of these genes. In addition, novel large families of planctomycete-specific paralogs with as yet unknown functions were identified, which are notably absent from the genome of 'Candidatus Kuenenia stuttgartiensis'. The high numbers of sulfatases in marine planctomycetes characterizes them as specialists for the initial breakdown of sulfatated heteropolysaccharides and indicate their importance for recycling carbon from these compounds. The almost ubiquitous presence of C1 metabolism genes among Planctomycetes together with codon usage analysis and information from the genomes suggest a general importance of these genes for Planctomycetes other than formaldehyde detoxification. The notable absence of these genes in Candidatus K. stuttgartiensis plus the surprising lack of almost any planctomycete-specific gene within this organism reveals an unexpected distinctiveness of anammox bacteria from all other Planctomycetes.


Assuntos
Bactérias/genética , Genoma Bacteriano , Água do Mar/microbiologia , Oceano Atlântico , Bactérias/classificação , Bactérias/enzimologia , Biblioteca Genômica , Genômica , Namíbia , Oregon , Oceano Pacífico , Filogenia , Sulfatases/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA